bevy/crates/bevy_render/src/render_resource/bindless.rs
Patrick Walton 28441337bb
Use global binding arrays for bindless resources. (#17898)
Currently, Bevy's implementation of bindless resources is rather
unusual: every binding in an object that implements `AsBindGroup` (most
commonly, a material) becomes its own separate binding array in the
shader. This is inefficient for two reasons:

1. If multiple materials reference the same texture or other resource,
the reference to that resource will be duplicated many times. This
increases `wgpu` validation overhead.

2. It creates many unused binding array slots. This increases `wgpu` and
driver overhead and makes it easier to hit limits on APIs that `wgpu`
currently imposes tight resource limits on, like Metal.

This PR fixes these issues by switching Bevy to use the standard
approach in GPU-driven renderers, in which resources are de-duplicated
and passed as global arrays, one for each type of resource.

Along the way, this patch introduces per-platform resource limits and
bumps them from 16 resources per binding array to 64 resources per bind
group on Metal and 2048 resources per bind group on other platforms.
(Note that the number of resources per *binding array* isn't the same as
the number of resources per *bind group*; as it currently stands, if all
the PBR features are turned on, Bevy could pack as many as 496 resources
into a single slab.) The limits have been increased because `wgpu` now
has universal support for partially-bound binding arrays, which mean
that we no longer need to fill the binding arrays with fallback
resources on Direct3D 12. The `#[bindless(LIMIT)]` declaration when
deriving `AsBindGroup` can now simply be written `#[bindless]` in order
to have Bevy choose a default limit size for the current platform.
Custom limits are still available with the new
`#[bindless(limit(LIMIT))]` syntax: e.g. `#[bindless(limit(8))]`.

The material bind group allocator has been completely rewritten. Now
there are two allocators: one for bindless materials and one for
non-bindless materials. The new non-bindless material allocator simply
maintains a 1:1 mapping from material to bind group. The new bindless
material allocator maintains a list of slabs and allocates materials
into slabs on a first-fit basis. This unfortunately makes its
performance O(number of resources per object * number of slabs), but the
number of slabs is likely to be low, and it's planned to become even
lower in the future with `wgpu` improvements. Resources are
de-duplicated with in a slab and reference counted. So, for instance, if
multiple materials refer to the same texture, that texture will exist
only once in the appropriate binding array.

To support these new features, this patch adds the concept of a
*bindless descriptor* to the `AsBindGroup` trait. The bindless
descriptor allows the material bind group allocator to probe the layout
of the material, now that an array of `BindGroupLayoutEntry` records is
insufficient to describe the group. The `#[derive(AsBindGroup)]` has
been heavily modified to support the new features. The most important
user-facing change to that macro is that the struct-level `uniform`
attribute, `#[uniform(BINDING_NUMBER, StandardMaterial)]`, now reads
`#[uniform(BINDLESS_INDEX, MATERIAL_UNIFORM_TYPE,
binding_array(BINDING_NUMBER)]`, allowing the material to specify the
binding number for the binding array that holds the uniform data.

To make this patch simpler, I removed support for bindless
`ExtendedMaterial`s, as well as field-level bindless uniform and storage
buffers. I intend to add back support for these as a follow-up. Because
they aren't in any released Bevy version yet, I figured this was OK.

Finally, this patch updates `StandardMaterial` for the new bindless
changes. Generally, code throughout the PBR shaders that looked like
`base_color_texture[slot]` now looks like
`bindless_2d_textures[material_indices[slot].base_color_texture]`.

This patch fixes a system hang that I experienced on the [Caldera test]
when running with `caldera --random-materials --texture-count 100`. The
time per frame is around 19.75 ms, down from 154.2 ms in Bevy 0.14: a
7.8× speedup.

[Caldera test]: https://github.com/DGriffin91/bevy_caldera_scene
2025-02-21 05:55:36 +00:00

314 lines
13 KiB
Rust

//! Types and functions relating to bindless resources.
use alloc::borrow::Cow;
use core::num::{NonZeroU32, NonZeroU64};
use bevy_derive::{Deref, DerefMut};
use wgpu::{
BindGroupLayoutEntry, SamplerBindingType, ShaderStages, TextureSampleType, TextureViewDimension,
};
use crate::render_resource::binding_types::storage_buffer_read_only_sized;
use super::binding_types::{
sampler, texture_1d, texture_2d, texture_2d_array, texture_3d, texture_cube, texture_cube_array,
};
/// The default value for the number of resources that can be stored in a slab
/// on this platform.
///
/// See the documentation for [`BindlessSlabResourceLimit`] for more
/// information.
#[cfg(any(target_os = "macos", target_os = "ios"))]
pub const AUTO_BINDLESS_SLAB_RESOURCE_LIMIT: u32 = 64;
/// The default value for the number of resources that can be stored in a slab
/// on this platform.
///
/// See the documentation for [`BindlessSlabResourceLimit`] for more
/// information.
#[cfg(not(any(target_os = "macos", target_os = "ios")))]
pub const AUTO_BINDLESS_SLAB_RESOURCE_LIMIT: u32 = 2048;
/// The binding numbers for the built-in binding arrays of each bindless
/// resource type.
///
/// In the case of materials, the material allocator manages these binding
/// arrays.
///
/// `bindless.wgsl` contains declarations of these arrays for use in your
/// shaders. If you change these, make sure to update that file as well.
pub static BINDING_NUMBERS: [(BindlessResourceType, BindingNumber); 9] = [
(BindlessResourceType::SamplerFiltering, BindingNumber(1)),
(BindlessResourceType::SamplerNonFiltering, BindingNumber(2)),
(BindlessResourceType::SamplerComparison, BindingNumber(3)),
(BindlessResourceType::Texture1d, BindingNumber(4)),
(BindlessResourceType::Texture2d, BindingNumber(5)),
(BindlessResourceType::Texture2dArray, BindingNumber(6)),
(BindlessResourceType::Texture3d, BindingNumber(7)),
(BindlessResourceType::TextureCube, BindingNumber(8)),
(BindlessResourceType::TextureCubeArray, BindingNumber(9)),
];
/// The maximum number of resources that can be stored in a slab.
///
/// This limit primarily exists in order to work around `wgpu` performance
/// problems involving large numbers of bindless resources. Also, some
/// platforms, such as Metal, currently enforce limits on the number of
/// resources in use.
///
/// This corresponds to `LIMIT` in the `#[bindless(LIMIT)]` attribute when
/// deriving [`crate::render_resource::AsBindGroup`].
#[derive(Clone, Copy, Default, PartialEq, Debug)]
pub enum BindlessSlabResourceLimit {
/// Allows the renderer to choose a reasonable value for the resource limit
/// based on the platform.
///
/// This value has been tuned, so you should default to this value unless
/// you have special platform-specific considerations that prevent you from
/// using it.
#[default]
Auto,
/// A custom value for the resource limit.
///
/// Bevy will allocate no more than this number of resources in a slab,
/// unless exceeding this value is necessary in order to allocate at all
/// (i.e. unless the number of bindless resources in your bind group exceeds
/// this value), in which case Bevy can exceed it.
Custom(u32),
}
/// Information about the bindless resources in this object.
///
/// The material bind group allocator uses this descriptor in order to create
/// and maintain bind groups. The fields within this bindless descriptor are
/// [`Cow`]s in order to support both the common case in which the fields are
/// simply `static` constants and the more unusual case in which the fields are
/// dynamically generated efficiently. An example of the latter case is
/// `ExtendedMaterial`, which needs to assemble a bindless descriptor from those
/// of the base material and the material extension at runtime.
///
/// This structure will only be present if this object is bindless.
pub struct BindlessDescriptor {
/// The bindless resource types that this object uses, in order of bindless
/// index.
///
/// The resource assigned to binding index 0 will be at index 0, the
/// resource assigned to binding index will be at index 1 in this array, and
/// so on. Unused binding indices are set to [`BindlessResourceType::None`].
pub resources: Cow<'static, [BindlessResourceType]>,
/// The [`BindlessBufferDescriptor`] for each bindless buffer that this
/// object uses.
///
/// The order of this array is irrelevant.
pub buffers: Cow<'static, [BindlessBufferDescriptor]>,
}
/// The type of potentially-bindless resource.
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Debug)]
pub enum BindlessResourceType {
/// No bindless resource.
///
/// This is used as a placeholder to fill holes in the
/// [`BindlessDescriptor::resources`] list.
None,
/// A storage buffer.
Buffer,
/// A filtering sampler.
SamplerFiltering,
/// A non-filtering sampler (nearest neighbor).
SamplerNonFiltering,
/// A comparison sampler (typically used for shadow maps).
SamplerComparison,
/// A 1D texture.
Texture1d,
/// A 2D texture.
Texture2d,
/// A 2D texture array.
///
/// Note that this differs from a binding array. 2D texture arrays must all
/// have the same size and format.
Texture2dArray,
/// A 3D texture.
Texture3d,
/// A cubemap texture.
TextureCube,
/// A cubemap texture array.
///
/// Note that this differs from a binding array. Cubemap texture arrays must
/// all have the same size and format.
TextureCubeArray,
}
/// Describes a bindless buffer.
///
/// Unlike samplers and textures, each buffer in a bind group gets its own
/// unique bind group entry. That is, there isn't any `bindless_buffers` binding
/// array to go along with `bindless_textures_2d`,
/// `bindless_samplers_filtering`, etc. Therefore, this descriptor contains two
/// indices: the *binding number* and the *bindless index*. The binding number
/// is the `@binding` number used in the shader, while the bindless index is the
/// index of the buffer in the bindless index table (which is itself
/// conventionally bound to binding number 0).
///
/// When declaring the buffer in a derived implementation
/// [`crate::render_resource::AsBindGroup`] with syntax like
/// `#[uniform(BINDLESS_INDEX, StandardMaterialUniform,
/// bindless(BINDING_NUMBER)]`, the bindless index is `BINDLESS_INDEX`, and the
/// binding number is `BINDING_NUMBER`. Note the order.
#[derive(Clone, Copy)]
pub struct BindlessBufferDescriptor {
/// The actual binding number of the buffer.
///
/// This is declared with `@binding` in WGSL. When deriving
/// [`crate::render_resource::AsBindGroup`], this is the `BINDING_NUMBER` in
/// `#[uniform(BINDLESS_INDEX, StandardMaterialUniform,
/// bindless(BINDING_NUMBER)]`.
pub binding_number: BindingNumber,
/// The index of the buffer in the bindless index table.
///
/// In the shader, this is the index into the table bound to binding 0. When
/// deriving [`crate::render_resource::AsBindGroup`], this is the
/// `BINDLESS_INDEX` in `#[uniform(BINDLESS_INDEX, StandardMaterialUniform,
/// bindless(BINDING_NUMBER)]`.
pub bindless_index: BindlessIndex,
/// The size of the buffer in bytes.
pub size: usize,
}
/// The index of the actual binding in the bind group.
///
/// This is the value specified in WGSL as `@binding`.
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug, Deref, DerefMut)]
pub struct BindingNumber(pub u32);
/// The index in the bindless index table.
///
/// This table is conventionally bound to binding number 0.
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug, Deref, DerefMut)]
pub struct BindlessIndex(pub u32);
/// Creates the bind group layout entries common to all shaders that use
/// bindless bind groups.
///
/// `bindless_resource_count` specifies the total number of bindless resources.
/// `bindless_slab_resource_limit` specifies the resolved
/// [`BindlessSlabResourceLimit`] value.
pub fn create_bindless_bind_group_layout_entries(
bindless_resource_count: u32,
bindless_slab_resource_limit: u32,
) -> Vec<BindGroupLayoutEntry> {
let bindless_slab_resource_limit =
NonZeroU32::new(bindless_slab_resource_limit).expect("Bindless slot count must be nonzero");
// The maximum size of a binding array is the
// `bindless_slab_resource_limit`, which would occur if all of the bindless
// resources were of the same type. So we create our binding arrays with
// that size.
vec![
// Start with the bindless index table, bound to binding number 0.
storage_buffer_read_only_sized(
false,
NonZeroU64::new(bindless_resource_count as u64 * size_of::<u32>() as u64),
)
.build(0, ShaderStages::all()),
// Continue with the common bindless buffers.
sampler(SamplerBindingType::Filtering)
.count(bindless_slab_resource_limit)
.build(1, ShaderStages::all()),
sampler(SamplerBindingType::NonFiltering)
.count(bindless_slab_resource_limit)
.build(2, ShaderStages::all()),
sampler(SamplerBindingType::Comparison)
.count(bindless_slab_resource_limit)
.build(3, ShaderStages::all()),
texture_1d(TextureSampleType::Float { filterable: true })
.count(bindless_slab_resource_limit)
.build(4, ShaderStages::all()),
texture_2d(TextureSampleType::Float { filterable: true })
.count(bindless_slab_resource_limit)
.build(5, ShaderStages::all()),
texture_2d_array(TextureSampleType::Float { filterable: true })
.count(bindless_slab_resource_limit)
.build(6, ShaderStages::all()),
texture_3d(TextureSampleType::Float { filterable: true })
.count(bindless_slab_resource_limit)
.build(7, ShaderStages::all()),
texture_cube(TextureSampleType::Float { filterable: true })
.count(bindless_slab_resource_limit)
.build(8, ShaderStages::all()),
texture_cube_array(TextureSampleType::Float { filterable: true })
.count(bindless_slab_resource_limit)
.build(9, ShaderStages::all()),
]
}
impl BindlessSlabResourceLimit {
/// Determines the actual bindless slab resource limit on this platform.
pub fn resolve(&self) -> u32 {
match *self {
BindlessSlabResourceLimit::Auto => AUTO_BINDLESS_SLAB_RESOURCE_LIMIT,
BindlessSlabResourceLimit::Custom(limit) => limit,
}
}
}
impl BindlessResourceType {
/// Returns the binding number for the common array of this resource type.
///
/// For example, if you pass `BindlessResourceType::Texture2d`, this will
/// return 5, in order to match the `@group(2) @binding(5) var
/// bindless_textures_2d: binding_array<texture_2d<f32>>` declaration in
/// `bindless.wgsl`.
///
/// Not all resource types have fixed binding numbers. If you call
/// [`Self::binding_number`] on such a resource type, it returns `None`.
///
/// Note that this returns a static reference to the binding number, not the
/// binding number itself. This is to conform to an idiosyncratic API in
/// `wgpu` whereby binding numbers for binding arrays are taken by `&u32`
/// *reference*, not by `u32` value.
pub fn binding_number(&self) -> Option<&'static BindingNumber> {
match BINDING_NUMBERS.binary_search_by_key(self, |(key, _)| *key) {
Ok(binding_number) => Some(&BINDING_NUMBERS[binding_number].1),
Err(_) => None,
}
}
}
impl From<TextureViewDimension> for BindlessResourceType {
fn from(texture_view_dimension: TextureViewDimension) -> Self {
match texture_view_dimension {
TextureViewDimension::D1 => BindlessResourceType::Texture1d,
TextureViewDimension::D2 => BindlessResourceType::Texture2d,
TextureViewDimension::D2Array => BindlessResourceType::Texture2dArray,
TextureViewDimension::Cube => BindlessResourceType::TextureCube,
TextureViewDimension::CubeArray => BindlessResourceType::TextureCubeArray,
TextureViewDimension::D3 => BindlessResourceType::Texture3d,
}
}
}
impl From<SamplerBindingType> for BindlessResourceType {
fn from(sampler_binding_type: SamplerBindingType) -> Self {
match sampler_binding_type {
SamplerBindingType::Filtering => BindlessResourceType::SamplerFiltering,
SamplerBindingType::NonFiltering => BindlessResourceType::SamplerNonFiltering,
SamplerBindingType::Comparison => BindlessResourceType::SamplerComparison,
}
}
}
impl From<u32> for BindlessIndex {
fn from(value: u32) -> Self {
Self(value)
}
}
impl From<u32> for BindingNumber {
fn from(value: u32) -> Self {
Self(value)
}
}