96dcbc5f8c
14 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
96dcbc5f8c
|
Ugrade to wgpu version 25.0 (#19563)
# Objective Upgrade to `wgpu` version `25.0`. Depends on https://github.com/bevyengine/naga_oil/pull/121 ## Solution ### Problem The biggest issue we face upgrading is the following requirement: > To facilitate this change, there was an additional validation rule put in place: if there is a binding array in a bind group, you may not use dynamic offset buffers or uniform buffers in that bind group. This requirement comes from vulkan rules on UpdateAfterBind descriptors. This is a major difficulty for us, as there are a number of binding arrays that are used in the view bind group. Note, this requirement does not affect merely uniform buffors that use dynamic offset but the use of *any* uniform in a bind group that also has a binding array. ### Attempted fixes The easiest fix would be to change uniforms to be storage buffers whenever binding arrays are in use: ```wgsl #ifdef BINDING_ARRAYS_ARE_USED @group(0) @binding(0) var<uniform> view: View; @group(0) @binding(1) var<uniform> lights: types::Lights; #else @group(0) @binding(0) var<storage> view: array<View>; @group(0) @binding(1) var<storage> lights: array<types::Lights>; #endif ``` This requires passing the view index to the shader so that we know where to index into the buffer: ```wgsl struct PushConstants { view_index: u32, } var<push_constant> push_constants: PushConstants; ``` Using push constants is no problem because binding arrays are only usable on native anyway. However, this greatly complicates the ability to access `view` in shaders. For example: ```wgsl #ifdef BINDING_ARRAYS_ARE_USED mesh_view_bindings::view.view_from_world[0].z #else mesh_view_bindings::view[mesh_view_bindings::view_index].view_from_world[0].z #endif ``` Using this approach would work but would have the effect of polluting our shaders with ifdef spam basically *everywhere*. Why not use a function? Unfortunately, the following is not valid wgsl as it returns a binding directly from a function in the uniform path. ```wgsl fn get_view() -> View { #if BINDING_ARRAYS_ARE_USED let view_index = push_constants.view_index; let view = views[view_index]; #endif return view; } ``` This also poses problems for things like lights where we want to return a ptr to the light data. Returning ptrs from wgsl functions isn't allowed even if both bindings were buffers. The next attempt was to simply use indexed buffers everywhere, in both the binding array and non binding array path. This would be viable if push constants were available everywhere to pass the view index, but unfortunately they are not available on webgpu. This means either passing the view index in a storage buffer (not ideal for such a small amount of state) or using push constants sometimes and uniform buffers only on webgpu. However, this kind of conditional layout infects absolutely everything. Even if we were to accept just using storage buffer for the view index, there's also the additional problem that some dynamic offsets aren't actually per-view but per-use of a setting on a camera, which would require passing that uniform data on *every* camera regardless of whether that rendering feature is being used, which is also gross. As such, although it's gross, the simplest solution just to bump binding arrays into `@group(1)` and all other bindings up one bind group. This should still bring us under the device limit of 4 for most users. ### Next steps / looking towards the future I'd like to avoid needing split our view bind group into multiple parts. In the future, if `wgpu` were to add `@builtin(draw_index)`, we could build a list of draw state in gpu processing and avoid the need for any kind of state change at all (see https://github.com/gfx-rs/wgpu/issues/6823). This would also provide significantly more flexibility to handle things like offsets into other arrays that may not be per-view. ### Testing Tested a number of examples, there are probably more that are still broken. --------- Co-authored-by: François Mockers <mockersf@gmail.com> Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com> |
||
![]() |
e9a0ef49f9
|
Rename bevy_platform_support to bevy_platform (#18813)
# Objective The goal of `bevy_platform_support` is to provide a set of platform agnostic APIs, alongside platform-specific functionality. This is a high traffic crate (providing things like HashMap and Instant). Especially in light of https://github.com/bevyengine/bevy/discussions/18799, it deserves a friendlier / shorter name. Given that it hasn't had a full release yet, getting this change in before Bevy 0.16 makes sense. ## Solution - Rename `bevy_platform_support` to `bevy_platform`. |
||
![]() |
6c619397d5
|
Unify RenderMaterialInstances and RenderMeshMaterialIds , and fix an associated race condition. (#18734)
Currently, `RenderMaterialInstances` and `RenderMeshMaterialIds` are very similar render-world resources: the former maps main world meshes to typed material asset IDs, and the latter maps main world meshes to untyped material asset IDs. This is needlessly-complex and wasteful, so this patch unifies the two in favor of a single untyped `RenderMaterialInstances` resource. This patch also fixes a subtle issue that could cause mesh materials to be incorrect if a `MeshMaterial3d<A>` was removed and replaced with a `MeshMaterial3d<B>` material in the same frame. The problematic pattern looks like: 1. `extract_mesh_materials<B>` runs and, seeing the `Changed<MeshMaterial3d<B>>` condition, adds an entry mapping the mesh to the new material to the untyped `RenderMeshMaterialIds`. 2. `extract_mesh_materials<A>` runs and, seeing that the entity is present in `RemovedComponents<MeshMaterial3d<A>>`, removes the entry from `RenderMeshMaterialIds`. 3. The material slot is now empty, and the mesh will show up as whatever material happens to be in slot 0 in the material data slab. This commit fixes the issue by splitting out `extract_mesh_materials` into *three* phases: *extraction*, *early sweeping*, and *late sweeping*, which run in that order: 1. The *extraction* system, which runs for each material, updates `RenderMaterialInstances` records whenever `MeshMaterial3d` components change, and updates a change tick so that the following system will know not to remove it. 2. The *early sweeping* system, which runs for each material, processes entities present in `RemovedComponents<MeshMaterial3d>` and removes each such entity's record from `RenderMeshInstances` only if the extraction system didn't update it this frame. This system runs after *all* extraction systems have completed, fixing the race condition. 3. The *late sweeping* system, which runs only once regardless of the number of materials in the scene, processes entities present in `RemovedComponents<ViewVisibility>` and, as in the early sweeping phase, removes each such entity's record from `RenderMeshInstances` only if the extraction system didn't update it this frame. At the end, the late sweeping system updates the change tick. Because this pattern happens relatively frequently, I think this PR should land for 0.16. |
||
![]() |
65d9a7535a
|
Move non-generic parts of the PrepassPipeline to internal field (#18322)
# Objective - The prepass pipeline has a generic bound on the specialize function but 95% of it doesn't need it ## Solution - Move most of the fields to an internal struct and use a separate specialize function for those fields ## Testing - Ran the 3d_scene and it worked like before --- ## Migration Guide If you were using a field of the `PrepassPipeline`, most of them have now been move to `PrepassPipeline::internal`. ## Notes Here's the cargo bloat size comparison (from this tool https://github.com/bevyengine/bevy/discussions/14864): ``` before: ( "<bevy_pbr::prepass::PrepassPipeline<M> as bevy_render::render_resource::pipeline_specializer::SpecializedMeshPipeline>::specialize", 25416, 0.05582993, ), after: ( "<bevy_pbr::prepass::PrepassPipeline<M> as bevy_render::render_resource::pipeline_specializer::SpecializedMeshPipeline>::specialize", 2496, 0.005490916, ), ( "bevy_pbr::prepass::PrepassPipelineInternal::specialize", 11444, 0.025175499, ), ``` The size for the specialize function that is generic is now much smaller, so users won't need to recompile it for every material. |
||
![]() |
28441337bb
|
Use global binding arrays for bindless resources. (#17898)
Currently, Bevy's implementation of bindless resources is rather unusual: every binding in an object that implements `AsBindGroup` (most commonly, a material) becomes its own separate binding array in the shader. This is inefficient for two reasons: 1. If multiple materials reference the same texture or other resource, the reference to that resource will be duplicated many times. This increases `wgpu` validation overhead. 2. It creates many unused binding array slots. This increases `wgpu` and driver overhead and makes it easier to hit limits on APIs that `wgpu` currently imposes tight resource limits on, like Metal. This PR fixes these issues by switching Bevy to use the standard approach in GPU-driven renderers, in which resources are de-duplicated and passed as global arrays, one for each type of resource. Along the way, this patch introduces per-platform resource limits and bumps them from 16 resources per binding array to 64 resources per bind group on Metal and 2048 resources per bind group on other platforms. (Note that the number of resources per *binding array* isn't the same as the number of resources per *bind group*; as it currently stands, if all the PBR features are turned on, Bevy could pack as many as 496 resources into a single slab.) The limits have been increased because `wgpu` now has universal support for partially-bound binding arrays, which mean that we no longer need to fill the binding arrays with fallback resources on Direct3D 12. The `#[bindless(LIMIT)]` declaration when deriving `AsBindGroup` can now simply be written `#[bindless]` in order to have Bevy choose a default limit size for the current platform. Custom limits are still available with the new `#[bindless(limit(LIMIT))]` syntax: e.g. `#[bindless(limit(8))]`. The material bind group allocator has been completely rewritten. Now there are two allocators: one for bindless materials and one for non-bindless materials. The new non-bindless material allocator simply maintains a 1:1 mapping from material to bind group. The new bindless material allocator maintains a list of slabs and allocates materials into slabs on a first-fit basis. This unfortunately makes its performance O(number of resources per object * number of slabs), but the number of slabs is likely to be low, and it's planned to become even lower in the future with `wgpu` improvements. Resources are de-duplicated with in a slab and reference counted. So, for instance, if multiple materials refer to the same texture, that texture will exist only once in the appropriate binding array. To support these new features, this patch adds the concept of a *bindless descriptor* to the `AsBindGroup` trait. The bindless descriptor allows the material bind group allocator to probe the layout of the material, now that an array of `BindGroupLayoutEntry` records is insufficient to describe the group. The `#[derive(AsBindGroup)]` has been heavily modified to support the new features. The most important user-facing change to that macro is that the struct-level `uniform` attribute, `#[uniform(BINDING_NUMBER, StandardMaterial)]`, now reads `#[uniform(BINDLESS_INDEX, MATERIAL_UNIFORM_TYPE, binding_array(BINDING_NUMBER)]`, allowing the material to specify the binding number for the binding array that holds the uniform data. To make this patch simpler, I removed support for bindless `ExtendedMaterial`s, as well as field-level bindless uniform and storage buffers. I intend to add back support for these as a follow-up. Because they aren't in any released Bevy version yet, I figured this was OK. Finally, this patch updates `StandardMaterial` for the new bindless changes. Generally, code throughout the PBR shaders that looked like `base_color_texture[slot]` now looks like `bindless_2d_textures[material_indices[slot].base_color_texture]`. This patch fixes a system hang that I experienced on the [Caldera test] when running with `caldera --random-materials --texture-count 100`. The time per frame is around 19.75 ms, down from 154.2 ms in Bevy 0.14: a 7.8× speedup. [Caldera test]: https://github.com/DGriffin91/bevy_caldera_scene |
||
![]() |
9bc0ae33c3
|
Move hashbrown and foldhash out of bevy_utils (#17460)
# Objective - Contributes to #16877 ## Solution - Moved `hashbrown`, `foldhash`, and related types out of `bevy_utils` and into `bevy_platform_support` - Refactored the above to match the layout of these types in `std`. - Updated crates as required. ## Testing - CI --- ## Migration Guide - The following items were moved out of `bevy_utils` and into `bevy_platform_support::hash`: - `FixedState` - `DefaultHasher` - `RandomState` - `FixedHasher` - `Hashed` - `PassHash` - `PassHasher` - `NoOpHash` - The following items were moved out of `bevy_utils` and into `bevy_platform_support::collections`: - `HashMap` - `HashSet` - `bevy_utils::hashbrown` has been removed. Instead, import from `bevy_platform_support::collections` _or_ take a dependency on `hashbrown` directly. - `bevy_utils::Entry` has been removed. Instead, import from `bevy_platform_support::collections::hash_map` or `bevy_platform_support::collections::hash_set` as appropriate. - All of the above equally apply to `bevy::utils` and `bevy::platform_support`. ## Notes - I left `PreHashMap`, `PreHashMapExt`, and `TypeIdMap` in `bevy_utils` as they might be candidates for micro-crating. They can always be moved into `bevy_platform_support` at a later date if desired. |
||
![]() |
56aa90240e
|
Only include distance fog in the PBR shader if the view uses it. (#17495)
Right now, we always include distance fog in the shader, which is unfortunate as it's complex code and is rare. This commit changes it to be a `#define` instead. I haven't confirmed that removing distance fog meaningfully reduces VGPR usage, but it can't hurt. |
||
![]() |
3742e621ef
|
Allow clippy::too_many_arguments to lint without warnings (#17249)
# Objective Many instances of `clippy::too_many_arguments` linting happen to be on systems - functions which we don't call manually, and thus there's not much reason to worry about the argument count. ## Solution Allow `clippy::too_many_arguments` globally, and remove all lint attributes related to it. |
||
![]() |
bed9ddf3ce
|
Refactor and simplify custom projections (#17063)
# Objective - Fixes https://github.com/bevyengine/bevy/issues/16556 - Closes https://github.com/bevyengine/bevy/issues/11807 ## Solution - Simplify custom projections by using a single source of truth - `Projection`, removing all existing generic systems and types. - Existing perspective and orthographic structs are no longer components - I could dissolve these to simplify further, but keeping them around was the fast way to implement this. - Instead of generics, introduce a third variant, with a trait object. - Do an object safety dance with an intermediate trait to allow cloning boxed camera projections. This is a normal rust polymorphism papercut. You can do this with a crate but a manual impl is short and sweet. ## Testing - Added a custom projection example --- ## Showcase - Custom projections and projection handling has been simplified. - Projection systems are no longer generic, with the potential for many different projection components on the same camera. - Instead `Projection` is now the single source of truth for camera projections, and is the only projection component. - Custom projections are still supported, and can be constructed with `Projection::custom()`. ## Migration Guide - `PerspectiveProjection` and `OrthographicProjection` are no longer components. Use `Projection` instead. - Custom projections should no longer be inserted as a component. Instead, simply set the custom projection as a value of `Projection` with `Projection::custom()`. |
||
![]() |
5adf831b42
|
Add a bindless mode to AsBindGroup . (#16368)
This patch adds the infrastructure necessary for Bevy to support *bindless resources*, by adding a new `#[bindless]` attribute to `AsBindGroup`. Classically, only a single texture (or sampler, or buffer) can be attached to each shader binding. This means that switching materials requires breaking a batch and issuing a new drawcall, even if the mesh is otherwise identical. This adds significant overhead not only in the driver but also in `wgpu`, as switching bind groups increases the amount of validation work that `wgpu` must do. *Bindless resources* are the typical solution to this problem. Instead of switching bindings between each texture, the renderer instead supplies a large *array* of all textures in the scene up front, and the material contains an index into that array. This pattern is repeated for buffers and samplers as well. The renderer now no longer needs to switch binding descriptor sets while drawing the scene. Unfortunately, as things currently stand, this approach won't quite work for Bevy. Two aspects of `wgpu` conspire to make this ideal approach unacceptably slow: 1. In the DX12 backend, all binding arrays (bindless resources) must have a constant size declared in the shader, and all textures in an array must be bound to actual textures. Changing the size requires a recompile. 2. Changing even one texture incurs revalidation of all textures, a process that takes time that's linear in the total size of the binding array. This means that declaring a large array of textures big enough to encompass the entire scene is presently unacceptably slow. For example, if you declare 4096 textures, then `wgpu` will have to revalidate all 4096 textures if even a single one changes. This process can take multiple frames. To work around this problem, this PR groups bindless resources into small *slabs* and maintains a free list for each. The size of each slab for the bindless arrays associated with a material is specified via the `#[bindless(N)]` attribute. For instance, consider the following declaration: ```rust #[derive(AsBindGroup)] #[bindless(16)] struct MyMaterial { #[buffer(0)] color: Vec4, #[texture(1)] #[sampler(2)] diffuse: Handle<Image>, } ``` The `#[bindless(N)]` attribute specifies that, if bindless arrays are supported on the current platform, each resource becomes a binding array of N instances of that resource. So, for `MyMaterial` above, the `color` attribute is exposed to the shader as `binding_array<vec4<f32>, 16>`, the `diffuse` texture is exposed to the shader as `binding_array<texture_2d<f32>, 16>`, and the `diffuse` sampler is exposed to the shader as `binding_array<sampler, 16>`. Inside the material's vertex and fragment shaders, the applicable index is available via the `material_bind_group_slot` field of the `Mesh` structure. So, for instance, you can access the current color like so: ```wgsl // `uniform` binding arrays are a non-sequitur, so `uniform` is automatically promoted // to `storage` in bindless mode. @group(2) @binding(0) var<storage> material_color: binding_array<Color, 4>; ... @fragment fn fragment(in: VertexOutput) -> @location(0) vec4<f32> { let color = material_color[mesh[in.instance_index].material_bind_group_slot]; ... } ``` Note that portable shader code can't guarantee that the current platform supports bindless textures. Indeed, bindless mode is only available in Vulkan and DX12. The `BINDLESS` shader definition is available for your use to determine whether you're on a bindless platform or not. Thus a portable version of the shader above would look like: ```wgsl #ifdef BINDLESS @group(2) @binding(0) var<storage> material_color: binding_array<Color, 4>; #else // BINDLESS @group(2) @binding(0) var<uniform> material_color: Color; #endif // BINDLESS ... @fragment fn fragment(in: VertexOutput) -> @location(0) vec4<f32> { #ifdef BINDLESS let color = material_color[mesh[in.instance_index].material_bind_group_slot]; #else // BINDLESS let color = material_color; #endif // BINDLESS ... } ``` Importantly, this PR *doesn't* update `StandardMaterial` to be bindless. So, for example, `scene_viewer` will currently not run any faster. I intend to update `StandardMaterial` to use bindless mode in a follow-up patch. A new example, `shaders/shader_material_bindless`, has been added to demonstrate how to use this new feature. Here's a Tracy profile of `submit_graph_commands` of this patch and an additional patch (not submitted yet) that makes `StandardMaterial` use bindless. Red is those patches; yellow is `main`. The scene was Bistro Exterior with a hack that forces all textures to opaque. You can see a 1.47x mean speedup.  ## Migration Guide * `RenderAssets::prepare_asset` now takes an `AssetId` parameter. * Bin keys now have Bevy-specific material bind group indices instead of `wgpu` material bind group IDs, as part of the bindless change. Use the new `MaterialBindGroupAllocator` to map from bind group index to bind group ID. |
||
![]() |
c29e67153b
|
Expose Pipeline Compilation Zero Initialize Workgroup Memory Option (#16301)
# Objective - wgpu 0.20 made workgroup vars stop being zero-init by default. this broke some applications (cough foresight cough) and now we workaround it. wgpu exposes a compilation option that zero initializes workgroup memory by default, but bevy does not expose it. ## Solution - expose the compilation option wgpu gives us ## Testing - ran examples: 3d_scene, compute_shader_game_of_life, gpu_readback, lines, specialized_mesh_pipeline. they all work - confirmed fix for our own problems --- </details> ## Migration Guide - add `zero_initialize_workgroup_memory: false,` to `ComputePipelineDescriptor` or `RenderPipelineDescriptor` structs to preserve 0.14 functionality, add `zero_initialize_workgroup_memory: true,` to restore bevy 0.13 functionality. |
||
![]() |
d70595b667
|
Add core and alloc over std Lints (#15281)
# Objective - Fixes #6370 - Closes #6581 ## Solution - Added the following lints to the workspace: - `std_instead_of_core` - `std_instead_of_alloc` - `alloc_instead_of_core` - Used `cargo +nightly fmt` with [item level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Item%5C%3A) to split all `use` statements into single items. - Used `cargo clippy --workspace --all-targets --all-features --fix --allow-dirty` to _attempt_ to resolve the new linting issues, and intervened where the lint was unable to resolve the issue automatically (usually due to needing an `extern crate alloc;` statement in a crate root). - Manually removed certain uses of `std` where negative feature gating prevented `--all-features` from finding the offending uses. - Used `cargo +nightly fmt` with [crate level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Crate%5C%3A) to re-merge all `use` statements matching Bevy's previous styling. - Manually fixed cases where the `fmt` tool could not re-merge `use` statements due to conditional compilation attributes. ## Testing - Ran CI locally ## Migration Guide The MSRV is now 1.81. Please update to this version or higher. ## Notes - This is a _massive_ change to try and push through, which is why I've outlined the semi-automatic steps I used to create this PR, in case this fails and someone else tries again in the future. - Making this change has no impact on user code, but does mean Bevy contributors will be warned to use `core` and `alloc` instead of `std` where possible. - This lint is a critical first step towards investigating `no_std` options for Bevy. --------- Co-authored-by: François Mockers <francois.mockers@vleue.com> |
||
![]() |
afbbbd7335
|
Rename rendering components for improved consistency and clarity (#15035)
# Objective The names of numerous rendering components in Bevy are inconsistent and a bit confusing. Relevant names include: - `AutoExposureSettings` - `AutoExposureSettingsUniform` - `BloomSettings` - `BloomUniform` (no `Settings`) - `BloomPrefilterSettings` - `ChromaticAberration` (no `Settings`) - `ContrastAdaptiveSharpeningSettings` - `DepthOfFieldSettings` - `DepthOfFieldUniform` (no `Settings`) - `FogSettings` - `SmaaSettings`, `Fxaa`, `TemporalAntiAliasSettings` (really inconsistent??) - `ScreenSpaceAmbientOcclusionSettings` - `ScreenSpaceReflectionsSettings` - `VolumetricFogSettings` Firstly, there's a lot of inconsistency between `Foo`/`FooSettings` and `FooUniform`/`FooSettingsUniform` and whether names are abbreviated or not. Secondly, the `Settings` post-fix seems unnecessary and a bit confusing semantically, since it makes it seem like the component is mostly just auxiliary configuration instead of the core *thing* that actually enables the feature. This will be an even bigger problem once bundles like `TemporalAntiAliasBundle` are deprecated in favor of required components, as users will expect a component named `TemporalAntiAlias` (or similar), not `TemporalAntiAliasSettings`. ## Solution Drop the `Settings` post-fix from the component names, and change some names to be more consistent. - `AutoExposure` - `AutoExposureUniform` - `Bloom` - `BloomUniform` - `BloomPrefilter` - `ChromaticAberration` - `ContrastAdaptiveSharpening` - `DepthOfField` - `DepthOfFieldUniform` - `DistanceFog` - `Smaa`, `Fxaa`, `TemporalAntiAliasing` (note: we might want to change to `Taa`, see "Discussion") - `ScreenSpaceAmbientOcclusion` - `ScreenSpaceReflections` - `VolumetricFog` I kept the old names as deprecated type aliases to make migration a bit less painful for users. We should remove them after the next release. (And let me know if I should just... not add them at all) I also added some very basic docs for a few types where they were missing, like on `Fxaa` and `DepthOfField`. ## Discussion - `TemporalAntiAliasing` is still inconsistent with `Smaa` and `Fxaa`. Consensus [on Discord](https://discord.com/channels/691052431525675048/743663924229963868/1280601167209955431) seemed to be that renaming to `Taa` would probably be fine, but I think it's a bit more controversial, and it would've required renaming a lot of related types like `TemporalAntiAliasNode`, `TemporalAntiAliasBundle`, and `TemporalAntiAliasPlugin`, so I think it's better to leave to a follow-up. - I think `Fog` should probably have a more specific name like `DistanceFog` considering it seems to be distinct from `VolumetricFog`. ~~This should probably be done in a follow-up though, so I just removed the `Settings` post-fix for now.~~ (done) --- ## Migration Guide Many rendering components have been renamed for improved consistency and clarity. - `AutoExposureSettings` → `AutoExposure` - `BloomSettings` → `Bloom` - `BloomPrefilterSettings` → `BloomPrefilter` - `ContrastAdaptiveSharpeningSettings` → `ContrastAdaptiveSharpening` - `DepthOfFieldSettings` → `DepthOfField` - `FogSettings` → `DistanceFog` - `SmaaSettings` → `Smaa` - `TemporalAntiAliasSettings` → `TemporalAntiAliasing` - `ScreenSpaceAmbientOcclusionSettings` → `ScreenSpaceAmbientOcclusion` - `ScreenSpaceReflectionsSettings` → `ScreenSpaceReflections` - `VolumetricFogSettings` → `VolumetricFog` --------- Co-authored-by: Carter Anderson <mcanders1@gmail.com> |
||
![]() |
6cc96f4c1f
|
Meshlet software raster + start of cleanup (#14623)
# Objective - Faster meshlet rasterization path for small triangles - Avoid having to allocate and write out a triangle buffer - Refactor gpu_scene.rs ## Solution - Replace the 32bit visbuffer texture with a 64bit visbuffer buffer, where the left 32 bits encode depth, and the right 32 bits encode the existing cluster + triangle IDs. Can't use 64bit textures, wgpu/naga doesn't support atomic ops on textures yet. - Instead of writing out a buffer of packed cluster + triangle IDs (per triangle) to raster, the culling pass now writes out a buffer of just cluster IDs (per cluster, so less memory allocated, cheaper to write out). - Clusters for software raster are allocated from the left side - Clusters for hardware raster are allocated in the same buffer, from the right side - The buffer size is fixed at MeshletPlugin build time, and should be set to a reasonable value for your scene (no warning on overflow, and no good way to determine what value you need outside of renderdoc - I plan to fix this in a future PR adding a meshlet stats overlay) - Currently I don't have a heuristic for software vs hardware raster selection for each cluster. The existing code is just a placeholder. I need to profile on a release scene and come up with a heuristic, probably in a future PR. - The culling shader is getting pretty hard to follow at this point, but I don't want to spend time improving it as the entire shader/pass is getting rewritten/replaced in the near future. - Software raster is a compute workgroup per-cluster. Each workgroup loads and transforms the <=64 vertices of the cluster, and then rasterizes the <=64 triangles of the cluster. - Two variants are implemented: Scanline for clusters with any larger triangles (still smaller than hardware is good at), and brute-force for very very tiny triangles - Once the shader determines that a pixel should be filled in, it does an atomicMax() on the visbuffer to store the results, copying how Nanite works - On devices with a low max workgroups per dispatch limit, an extra compute pass is inserted before software raster to convert from a 1d to 2d dispatch (I don't think 3d would ever be necessary). - I haven't implemented the top-left rule or subpixel precision yet, I'm leaving that for a future PR since I get usable results without it for now - Resources used: https://kristoffer-dyrkorn.github.io/triangle-rasterizer and chapters 6-8 of https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index - Hardware raster now spawns 64*3 vertex invocations per meshlet, instead of the actual meshlet vertex count. Extra invocations just early-exit. - While this is slower than the existing system, hardware draws should be rare now that software raster is usable, and it saves a ton of memory using the unified cluster ID buffer. This would be fixed if wgpu had support for mesh shaders. - Instead of writing to a color+depth attachment, the hardware raster pass also does the same atomic visbuffer writes that software raster uses. - We have to bind a dummy render target anyways, as wgpu doesn't currently support render passes without any attachments - Material IDs are no longer written out during the main rasterization passes. - If we had async compute queues, we could overlap the software and hardware raster passes. - New material and depth resolve passes run at the end of the visbuffer node, and write out view depth and material ID depth textures ### Misc changes - Fixed cluster culling importing, but never actually using the previous view uniforms when doing occlusion culling - Fixed incorrectly adding the LOD error twice when building the meshlet mesh - Splitup gpu_scene module into meshlet_mesh_manager, instance_manager, and resource_manager - resource_manager is still too complex and inefficient (extract and prepare are way too expensive). I plan on improving this in a future PR, but for now ResourceManager is mostly a 1:1 port of the leftover MeshletGpuScene bits. - Material draw passes have been renamed to the more accurate material shade pass, as well as some other misc renaming (in the future, these will be compute shaders even, and not actual draw calls) --- ## Migration Guide - TBD (ask me at the end of the release for meshlet changes as a whole) --------- Co-authored-by: vero <email@atlasdostal.com> |