0a95597c9d
5 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
96dcbc5f8c
|
Ugrade to wgpu version 25.0 (#19563)
# Objective Upgrade to `wgpu` version `25.0`. Depends on https://github.com/bevyengine/naga_oil/pull/121 ## Solution ### Problem The biggest issue we face upgrading is the following requirement: > To facilitate this change, there was an additional validation rule put in place: if there is a binding array in a bind group, you may not use dynamic offset buffers or uniform buffers in that bind group. This requirement comes from vulkan rules on UpdateAfterBind descriptors. This is a major difficulty for us, as there are a number of binding arrays that are used in the view bind group. Note, this requirement does not affect merely uniform buffors that use dynamic offset but the use of *any* uniform in a bind group that also has a binding array. ### Attempted fixes The easiest fix would be to change uniforms to be storage buffers whenever binding arrays are in use: ```wgsl #ifdef BINDING_ARRAYS_ARE_USED @group(0) @binding(0) var<uniform> view: View; @group(0) @binding(1) var<uniform> lights: types::Lights; #else @group(0) @binding(0) var<storage> view: array<View>; @group(0) @binding(1) var<storage> lights: array<types::Lights>; #endif ``` This requires passing the view index to the shader so that we know where to index into the buffer: ```wgsl struct PushConstants { view_index: u32, } var<push_constant> push_constants: PushConstants; ``` Using push constants is no problem because binding arrays are only usable on native anyway. However, this greatly complicates the ability to access `view` in shaders. For example: ```wgsl #ifdef BINDING_ARRAYS_ARE_USED mesh_view_bindings::view.view_from_world[0].z #else mesh_view_bindings::view[mesh_view_bindings::view_index].view_from_world[0].z #endif ``` Using this approach would work but would have the effect of polluting our shaders with ifdef spam basically *everywhere*. Why not use a function? Unfortunately, the following is not valid wgsl as it returns a binding directly from a function in the uniform path. ```wgsl fn get_view() -> View { #if BINDING_ARRAYS_ARE_USED let view_index = push_constants.view_index; let view = views[view_index]; #endif return view; } ``` This also poses problems for things like lights where we want to return a ptr to the light data. Returning ptrs from wgsl functions isn't allowed even if both bindings were buffers. The next attempt was to simply use indexed buffers everywhere, in both the binding array and non binding array path. This would be viable if push constants were available everywhere to pass the view index, but unfortunately they are not available on webgpu. This means either passing the view index in a storage buffer (not ideal for such a small amount of state) or using push constants sometimes and uniform buffers only on webgpu. However, this kind of conditional layout infects absolutely everything. Even if we were to accept just using storage buffer for the view index, there's also the additional problem that some dynamic offsets aren't actually per-view but per-use of a setting on a camera, which would require passing that uniform data on *every* camera regardless of whether that rendering feature is being used, which is also gross. As such, although it's gross, the simplest solution just to bump binding arrays into `@group(1)` and all other bindings up one bind group. This should still bring us under the device limit of 4 for most users. ### Next steps / looking towards the future I'd like to avoid needing split our view bind group into multiple parts. In the future, if `wgpu` were to add `@builtin(draw_index)`, we could build a list of draw state in gpu processing and avoid the need for any kind of state change at all (see https://github.com/gfx-rs/wgpu/issues/6823). This would also provide significantly more flexibility to handle things like offsets into other arrays that may not be per-view. ### Testing Tested a number of examples, there are probably more that are still broken. --------- Co-authored-by: François Mockers <mockersf@gmail.com> Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com> |
||
![]() |
913eb46324
|
Reimplement bindless storage buffers. (#17994)
Support for bindless storage buffers was temporarily removed with the bindless revamp. This commit restores that support. |
||
![]() |
28441337bb
|
Use global binding arrays for bindless resources. (#17898)
Currently, Bevy's implementation of bindless resources is rather unusual: every binding in an object that implements `AsBindGroup` (most commonly, a material) becomes its own separate binding array in the shader. This is inefficient for two reasons: 1. If multiple materials reference the same texture or other resource, the reference to that resource will be duplicated many times. This increases `wgpu` validation overhead. 2. It creates many unused binding array slots. This increases `wgpu` and driver overhead and makes it easier to hit limits on APIs that `wgpu` currently imposes tight resource limits on, like Metal. This PR fixes these issues by switching Bevy to use the standard approach in GPU-driven renderers, in which resources are de-duplicated and passed as global arrays, one for each type of resource. Along the way, this patch introduces per-platform resource limits and bumps them from 16 resources per binding array to 64 resources per bind group on Metal and 2048 resources per bind group on other platforms. (Note that the number of resources per *binding array* isn't the same as the number of resources per *bind group*; as it currently stands, if all the PBR features are turned on, Bevy could pack as many as 496 resources into a single slab.) The limits have been increased because `wgpu` now has universal support for partially-bound binding arrays, which mean that we no longer need to fill the binding arrays with fallback resources on Direct3D 12. The `#[bindless(LIMIT)]` declaration when deriving `AsBindGroup` can now simply be written `#[bindless]` in order to have Bevy choose a default limit size for the current platform. Custom limits are still available with the new `#[bindless(limit(LIMIT))]` syntax: e.g. `#[bindless(limit(8))]`. The material bind group allocator has been completely rewritten. Now there are two allocators: one for bindless materials and one for non-bindless materials. The new non-bindless material allocator simply maintains a 1:1 mapping from material to bind group. The new bindless material allocator maintains a list of slabs and allocates materials into slabs on a first-fit basis. This unfortunately makes its performance O(number of resources per object * number of slabs), but the number of slabs is likely to be low, and it's planned to become even lower in the future with `wgpu` improvements. Resources are de-duplicated with in a slab and reference counted. So, for instance, if multiple materials refer to the same texture, that texture will exist only once in the appropriate binding array. To support these new features, this patch adds the concept of a *bindless descriptor* to the `AsBindGroup` trait. The bindless descriptor allows the material bind group allocator to probe the layout of the material, now that an array of `BindGroupLayoutEntry` records is insufficient to describe the group. The `#[derive(AsBindGroup)]` has been heavily modified to support the new features. The most important user-facing change to that macro is that the struct-level `uniform` attribute, `#[uniform(BINDING_NUMBER, StandardMaterial)]`, now reads `#[uniform(BINDLESS_INDEX, MATERIAL_UNIFORM_TYPE, binding_array(BINDING_NUMBER)]`, allowing the material to specify the binding number for the binding array that holds the uniform data. To make this patch simpler, I removed support for bindless `ExtendedMaterial`s, as well as field-level bindless uniform and storage buffers. I intend to add back support for these as a follow-up. Because they aren't in any released Bevy version yet, I figured this was OK. Finally, this patch updates `StandardMaterial` for the new bindless changes. Generally, code throughout the PBR shaders that looked like `base_color_texture[slot]` now looks like `bindless_2d_textures[material_indices[slot].base_color_texture]`. This patch fixes a system hang that I experienced on the [Caldera test] when running with `caldera --random-materials --texture-count 100`. The time per frame is around 19.75 ms, down from 154.2 ms in Bevy 0.14: a 7.8× speedup. [Caldera test]: https://github.com/DGriffin91/bevy_caldera_scene |
||
![]() |
35826be6f7
|
Implement bindless lightmaps. (#16653)
This commit allows Bevy to bind 16 lightmaps at a time, if the current platform supports bindless textures. Naturally, if bindless textures aren't supported, Bevy falls back to binding only a single lightmap at a time. As lightmaps are usually heavily atlased, I doubt many scenes will use more than 16 lightmap textures. This has little performance impact now, but it's desirable for us to reap the benefits of multidraw and bindless textures on scenes that use lightmaps. Otherwise, we might have to break batches in order to switch those lightmaps. Additionally, this PR slightly reduces the cost of binning because it makes the lightmap index in `Opaque3dBinKey` 32 bits instead of an `AssetId`. ## Migration Guide * The `Opaque3dBinKey::lightmap_image` field is now `Opaque3dBinKey::lightmap_slab`, which is a lightweight identifier for an entire binding array of lightmaps. |
||
![]() |
5adf831b42
|
Add a bindless mode to AsBindGroup . (#16368)
This patch adds the infrastructure necessary for Bevy to support *bindless resources*, by adding a new `#[bindless]` attribute to `AsBindGroup`. Classically, only a single texture (or sampler, or buffer) can be attached to each shader binding. This means that switching materials requires breaking a batch and issuing a new drawcall, even if the mesh is otherwise identical. This adds significant overhead not only in the driver but also in `wgpu`, as switching bind groups increases the amount of validation work that `wgpu` must do. *Bindless resources* are the typical solution to this problem. Instead of switching bindings between each texture, the renderer instead supplies a large *array* of all textures in the scene up front, and the material contains an index into that array. This pattern is repeated for buffers and samplers as well. The renderer now no longer needs to switch binding descriptor sets while drawing the scene. Unfortunately, as things currently stand, this approach won't quite work for Bevy. Two aspects of `wgpu` conspire to make this ideal approach unacceptably slow: 1. In the DX12 backend, all binding arrays (bindless resources) must have a constant size declared in the shader, and all textures in an array must be bound to actual textures. Changing the size requires a recompile. 2. Changing even one texture incurs revalidation of all textures, a process that takes time that's linear in the total size of the binding array. This means that declaring a large array of textures big enough to encompass the entire scene is presently unacceptably slow. For example, if you declare 4096 textures, then `wgpu` will have to revalidate all 4096 textures if even a single one changes. This process can take multiple frames. To work around this problem, this PR groups bindless resources into small *slabs* and maintains a free list for each. The size of each slab for the bindless arrays associated with a material is specified via the `#[bindless(N)]` attribute. For instance, consider the following declaration: ```rust #[derive(AsBindGroup)] #[bindless(16)] struct MyMaterial { #[buffer(0)] color: Vec4, #[texture(1)] #[sampler(2)] diffuse: Handle<Image>, } ``` The `#[bindless(N)]` attribute specifies that, if bindless arrays are supported on the current platform, each resource becomes a binding array of N instances of that resource. So, for `MyMaterial` above, the `color` attribute is exposed to the shader as `binding_array<vec4<f32>, 16>`, the `diffuse` texture is exposed to the shader as `binding_array<texture_2d<f32>, 16>`, and the `diffuse` sampler is exposed to the shader as `binding_array<sampler, 16>`. Inside the material's vertex and fragment shaders, the applicable index is available via the `material_bind_group_slot` field of the `Mesh` structure. So, for instance, you can access the current color like so: ```wgsl // `uniform` binding arrays are a non-sequitur, so `uniform` is automatically promoted // to `storage` in bindless mode. @group(2) @binding(0) var<storage> material_color: binding_array<Color, 4>; ... @fragment fn fragment(in: VertexOutput) -> @location(0) vec4<f32> { let color = material_color[mesh[in.instance_index].material_bind_group_slot]; ... } ``` Note that portable shader code can't guarantee that the current platform supports bindless textures. Indeed, bindless mode is only available in Vulkan and DX12. The `BINDLESS` shader definition is available for your use to determine whether you're on a bindless platform or not. Thus a portable version of the shader above would look like: ```wgsl #ifdef BINDLESS @group(2) @binding(0) var<storage> material_color: binding_array<Color, 4>; #else // BINDLESS @group(2) @binding(0) var<uniform> material_color: Color; #endif // BINDLESS ... @fragment fn fragment(in: VertexOutput) -> @location(0) vec4<f32> { #ifdef BINDLESS let color = material_color[mesh[in.instance_index].material_bind_group_slot]; #else // BINDLESS let color = material_color; #endif // BINDLESS ... } ``` Importantly, this PR *doesn't* update `StandardMaterial` to be bindless. So, for example, `scene_viewer` will currently not run any faster. I intend to update `StandardMaterial` to use bindless mode in a follow-up patch. A new example, `shaders/shader_material_bindless`, has been added to demonstrate how to use this new feature. Here's a Tracy profile of `submit_graph_commands` of this patch and an additional patch (not submitted yet) that makes `StandardMaterial` use bindless. Red is those patches; yellow is `main`. The scene was Bistro Exterior with a hack that forces all textures to opaque. You can see a 1.47x mean speedup.  ## Migration Guide * `RenderAssets::prepare_asset` now takes an `AssetId` parameter. * Bin keys now have Bevy-specific material bind group indices instead of `wgpu` material bind group IDs, as part of the bindless change. Use the new `MaterialBindGroupAllocator` to map from bind group index to bind group ID. |