Commit Graph

5 Commits

Author SHA1 Message Date
charlotte 🌸
96dcbc5f8c
Ugrade to wgpu version 25.0 (#19563)
# Objective

Upgrade to `wgpu` version `25.0`.

Depends on https://github.com/bevyengine/naga_oil/pull/121

## Solution

### Problem

The biggest issue we face upgrading is the following requirement:
> To facilitate this change, there was an additional validation rule put
in place: if there is a binding array in a bind group, you may not use
dynamic offset buffers or uniform buffers in that bind group. This
requirement comes from vulkan rules on UpdateAfterBind descriptors.

This is a major difficulty for us, as there are a number of binding
arrays that are used in the view bind group. Note, this requirement does
not affect merely uniform buffors that use dynamic offset but the use of
*any* uniform in a bind group that also has a binding array.

### Attempted fixes

The easiest fix would be to change uniforms to be storage buffers
whenever binding arrays are in use:
```wgsl
#ifdef BINDING_ARRAYS_ARE_USED
@group(0) @binding(0) var<uniform> view: View;
@group(0) @binding(1) var<uniform> lights: types::Lights;
#else
@group(0) @binding(0) var<storage> view: array<View>;
@group(0) @binding(1) var<storage> lights: array<types::Lights>;
#endif
```

This requires passing the view index to the shader so that we know where
to index into the buffer:

```wgsl
struct PushConstants {
    view_index: u32,
}

var<push_constant> push_constants: PushConstants;
```

Using push constants is no problem because binding arrays are only
usable on native anyway.

However, this greatly complicates the ability to access `view` in
shaders. For example:
```wgsl
#ifdef BINDING_ARRAYS_ARE_USED
mesh_view_bindings::view.view_from_world[0].z
#else
mesh_view_bindings::view[mesh_view_bindings::view_index].view_from_world[0].z
#endif
```

Using this approach would work but would have the effect of polluting
our shaders with ifdef spam basically *everywhere*.

Why not use a function? Unfortunately, the following is not valid wgsl
as it returns a binding directly from a function in the uniform path.

```wgsl
fn get_view() -> View {
#if BINDING_ARRAYS_ARE_USED
    let view_index = push_constants.view_index;
    let view = views[view_index];
#endif
    return view;
}
```

This also poses problems for things like lights where we want to return
a ptr to the light data. Returning ptrs from wgsl functions isn't
allowed even if both bindings were buffers.

The next attempt was to simply use indexed buffers everywhere, in both
the binding array and non binding array path. This would be viable if
push constants were available everywhere to pass the view index, but
unfortunately they are not available on webgpu. This means either
passing the view index in a storage buffer (not ideal for such a small
amount of state) or using push constants sometimes and uniform buffers
only on webgpu. However, this kind of conditional layout infects
absolutely everything.

Even if we were to accept just using storage buffer for the view index,
there's also the additional problem that some dynamic offsets aren't
actually per-view but per-use of a setting on a camera, which would
require passing that uniform data on *every* camera regardless of
whether that rendering feature is being used, which is also gross.

As such, although it's gross, the simplest solution just to bump binding
arrays into `@group(1)` and all other bindings up one bind group. This
should still bring us under the device limit of 4 for most users.

### Next steps / looking towards the future

I'd like to avoid needing split our view bind group into multiple parts.
In the future, if `wgpu` were to add `@builtin(draw_index)`, we could
build a list of draw state in gpu processing and avoid the need for any
kind of state change at all (see
https://github.com/gfx-rs/wgpu/issues/6823). This would also provide
significantly more flexibility to handle things like offsets into other
arrays that may not be per-view.

### Testing

Tested a number of examples, there are probably more that are still
broken.

---------

Co-authored-by: François Mockers <mockersf@gmail.com>
Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com>
2025-06-26 19:41:47 +00:00
Patrick Walton
913eb46324
Reimplement bindless storage buffers. (#17994)
Support for bindless storage buffers was temporarily removed with the
bindless revamp. This commit restores that support.
2025-03-10 21:32:19 +00:00
Patrick Walton
28441337bb
Use global binding arrays for bindless resources. (#17898)
Currently, Bevy's implementation of bindless resources is rather
unusual: every binding in an object that implements `AsBindGroup` (most
commonly, a material) becomes its own separate binding array in the
shader. This is inefficient for two reasons:

1. If multiple materials reference the same texture or other resource,
the reference to that resource will be duplicated many times. This
increases `wgpu` validation overhead.

2. It creates many unused binding array slots. This increases `wgpu` and
driver overhead and makes it easier to hit limits on APIs that `wgpu`
currently imposes tight resource limits on, like Metal.

This PR fixes these issues by switching Bevy to use the standard
approach in GPU-driven renderers, in which resources are de-duplicated
and passed as global arrays, one for each type of resource.

Along the way, this patch introduces per-platform resource limits and
bumps them from 16 resources per binding array to 64 resources per bind
group on Metal and 2048 resources per bind group on other platforms.
(Note that the number of resources per *binding array* isn't the same as
the number of resources per *bind group*; as it currently stands, if all
the PBR features are turned on, Bevy could pack as many as 496 resources
into a single slab.) The limits have been increased because `wgpu` now
has universal support for partially-bound binding arrays, which mean
that we no longer need to fill the binding arrays with fallback
resources on Direct3D 12. The `#[bindless(LIMIT)]` declaration when
deriving `AsBindGroup` can now simply be written `#[bindless]` in order
to have Bevy choose a default limit size for the current platform.
Custom limits are still available with the new
`#[bindless(limit(LIMIT))]` syntax: e.g. `#[bindless(limit(8))]`.

The material bind group allocator has been completely rewritten. Now
there are two allocators: one for bindless materials and one for
non-bindless materials. The new non-bindless material allocator simply
maintains a 1:1 mapping from material to bind group. The new bindless
material allocator maintains a list of slabs and allocates materials
into slabs on a first-fit basis. This unfortunately makes its
performance O(number of resources per object * number of slabs), but the
number of slabs is likely to be low, and it's planned to become even
lower in the future with `wgpu` improvements. Resources are
de-duplicated with in a slab and reference counted. So, for instance, if
multiple materials refer to the same texture, that texture will exist
only once in the appropriate binding array.

To support these new features, this patch adds the concept of a
*bindless descriptor* to the `AsBindGroup` trait. The bindless
descriptor allows the material bind group allocator to probe the layout
of the material, now that an array of `BindGroupLayoutEntry` records is
insufficient to describe the group. The `#[derive(AsBindGroup)]` has
been heavily modified to support the new features. The most important
user-facing change to that macro is that the struct-level `uniform`
attribute, `#[uniform(BINDING_NUMBER, StandardMaterial)]`, now reads
`#[uniform(BINDLESS_INDEX, MATERIAL_UNIFORM_TYPE,
binding_array(BINDING_NUMBER)]`, allowing the material to specify the
binding number for the binding array that holds the uniform data.

To make this patch simpler, I removed support for bindless
`ExtendedMaterial`s, as well as field-level bindless uniform and storage
buffers. I intend to add back support for these as a follow-up. Because
they aren't in any released Bevy version yet, I figured this was OK.

Finally, this patch updates `StandardMaterial` for the new bindless
changes. Generally, code throughout the PBR shaders that looked like
`base_color_texture[slot]` now looks like
`bindless_2d_textures[material_indices[slot].base_color_texture]`.

This patch fixes a system hang that I experienced on the [Caldera test]
when running with `caldera --random-materials --texture-count 100`. The
time per frame is around 19.75 ms, down from 154.2 ms in Bevy 0.14: a
7.8× speedup.

[Caldera test]: https://github.com/DGriffin91/bevy_caldera_scene
2025-02-21 05:55:36 +00:00
Patrick Walton
35826be6f7
Implement bindless lightmaps. (#16653)
This commit allows Bevy to bind 16 lightmaps at a time, if the current
platform supports bindless textures. Naturally, if bindless textures
aren't supported, Bevy falls back to binding only a single lightmap at a
time. As lightmaps are usually heavily atlased, I doubt many scenes will
use more than 16 lightmap textures.

This has little performance impact now, but it's desirable for us to
reap the benefits of multidraw and bindless textures on scenes that use
lightmaps. Otherwise, we might have to break batches in order to switch
those lightmaps.

Additionally, this PR slightly reduces the cost of binning because it
makes the lightmap index in `Opaque3dBinKey` 32 bits instead of an
`AssetId`.

## Migration Guide

* The `Opaque3dBinKey::lightmap_image` field is now
`Opaque3dBinKey::lightmap_slab`, which is a lightweight identifier for
an entire binding array of lightmaps.
2024-12-16 23:37:06 +00:00
Patrick Walton
5adf831b42
Add a bindless mode to AsBindGroup. (#16368)
This patch adds the infrastructure necessary for Bevy to support
*bindless resources*, by adding a new `#[bindless]` attribute to
`AsBindGroup`.

Classically, only a single texture (or sampler, or buffer) can be
attached to each shader binding. This means that switching materials
requires breaking a batch and issuing a new drawcall, even if the mesh
is otherwise identical. This adds significant overhead not only in the
driver but also in `wgpu`, as switching bind groups increases the amount
of validation work that `wgpu` must do.

*Bindless resources* are the typical solution to this problem. Instead
of switching bindings between each texture, the renderer instead
supplies a large *array* of all textures in the scene up front, and the
material contains an index into that array. This pattern is repeated for
buffers and samplers as well. The renderer now no longer needs to switch
binding descriptor sets while drawing the scene.

Unfortunately, as things currently stand, this approach won't quite work
for Bevy. Two aspects of `wgpu` conspire to make this ideal approach
unacceptably slow:

1. In the DX12 backend, all binding arrays (bindless resources) must
have a constant size declared in the shader, and all textures in an
array must be bound to actual textures. Changing the size requires a
recompile.

2. Changing even one texture incurs revalidation of all textures, a
process that takes time that's linear in the total size of the binding
array.

This means that declaring a large array of textures big enough to
encompass the entire scene is presently unacceptably slow. For example,
if you declare 4096 textures, then `wgpu` will have to revalidate all
4096 textures if even a single one changes. This process can take
multiple frames.

To work around this problem, this PR groups bindless resources into
small *slabs* and maintains a free list for each. The size of each slab
for the bindless arrays associated with a material is specified via the
`#[bindless(N)]` attribute. For instance, consider the following
declaration:

```rust
#[derive(AsBindGroup)]
#[bindless(16)]
struct MyMaterial {
    #[buffer(0)]
    color: Vec4,
    #[texture(1)]
    #[sampler(2)]
    diffuse: Handle<Image>,
}
```

The `#[bindless(N)]` attribute specifies that, if bindless arrays are
supported on the current platform, each resource becomes a binding array
of N instances of that resource. So, for `MyMaterial` above, the `color`
attribute is exposed to the shader as `binding_array<vec4<f32>, 16>`,
the `diffuse` texture is exposed to the shader as
`binding_array<texture_2d<f32>, 16>`, and the `diffuse` sampler is
exposed to the shader as `binding_array<sampler, 16>`. Inside the
material's vertex and fragment shaders, the applicable index is
available via the `material_bind_group_slot` field of the `Mesh`
structure. So, for instance, you can access the current color like so:

```wgsl
// `uniform` binding arrays are a non-sequitur, so `uniform` is automatically promoted
// to `storage` in bindless mode.
@group(2) @binding(0) var<storage> material_color: binding_array<Color, 4>;
...
@fragment
fn fragment(in: VertexOutput) -> @location(0) vec4<f32> {
    let color = material_color[mesh[in.instance_index].material_bind_group_slot];
    ...
}
```

Note that portable shader code can't guarantee that the current platform
supports bindless textures. Indeed, bindless mode is only available in
Vulkan and DX12. The `BINDLESS` shader definition is available for your
use to determine whether you're on a bindless platform or not. Thus a
portable version of the shader above would look like:

```wgsl
#ifdef BINDLESS
@group(2) @binding(0) var<storage> material_color: binding_array<Color, 4>;
#else // BINDLESS
@group(2) @binding(0) var<uniform> material_color: Color;
#endif // BINDLESS
...
@fragment
fn fragment(in: VertexOutput) -> @location(0) vec4<f32> {
#ifdef BINDLESS
    let color = material_color[mesh[in.instance_index].material_bind_group_slot];
#else // BINDLESS
    let color = material_color;
#endif // BINDLESS
    ...
}
```

Importantly, this PR *doesn't* update `StandardMaterial` to be bindless.
So, for example, `scene_viewer` will currently not run any faster. I
intend to update `StandardMaterial` to use bindless mode in a follow-up
patch.

A new example, `shaders/shader_material_bindless`, has been added to
demonstrate how to use this new feature.

Here's a Tracy profile of `submit_graph_commands` of this patch and an
additional patch (not submitted yet) that makes `StandardMaterial` use
bindless. Red is those patches; yellow is `main`. The scene was Bistro
Exterior with a hack that forces all textures to opaque. You can see a
1.47x mean speedup.
![Screenshot 2024-11-12
161713](https://github.com/user-attachments/assets/4334b362-42c8-4d64-9cfb-6835f019b95c)

## Migration Guide

* `RenderAssets::prepare_asset` now takes an `AssetId` parameter.
* Bin keys now have Bevy-specific material bind group indices instead of
`wgpu` material bind group IDs, as part of the bindless change. Use the
new `MaterialBindGroupAllocator` to map from bind group index to bind
group ID.
2024-12-03 18:00:34 +00:00