bevy/crates/bevy_render/src/render_asset.rs
Patrick Walton 5adf831b42
Add a bindless mode to AsBindGroup. (#16368)
This patch adds the infrastructure necessary for Bevy to support
*bindless resources*, by adding a new `#[bindless]` attribute to
`AsBindGroup`.

Classically, only a single texture (or sampler, or buffer) can be
attached to each shader binding. This means that switching materials
requires breaking a batch and issuing a new drawcall, even if the mesh
is otherwise identical. This adds significant overhead not only in the
driver but also in `wgpu`, as switching bind groups increases the amount
of validation work that `wgpu` must do.

*Bindless resources* are the typical solution to this problem. Instead
of switching bindings between each texture, the renderer instead
supplies a large *array* of all textures in the scene up front, and the
material contains an index into that array. This pattern is repeated for
buffers and samplers as well. The renderer now no longer needs to switch
binding descriptor sets while drawing the scene.

Unfortunately, as things currently stand, this approach won't quite work
for Bevy. Two aspects of `wgpu` conspire to make this ideal approach
unacceptably slow:

1. In the DX12 backend, all binding arrays (bindless resources) must
have a constant size declared in the shader, and all textures in an
array must be bound to actual textures. Changing the size requires a
recompile.

2. Changing even one texture incurs revalidation of all textures, a
process that takes time that's linear in the total size of the binding
array.

This means that declaring a large array of textures big enough to
encompass the entire scene is presently unacceptably slow. For example,
if you declare 4096 textures, then `wgpu` will have to revalidate all
4096 textures if even a single one changes. This process can take
multiple frames.

To work around this problem, this PR groups bindless resources into
small *slabs* and maintains a free list for each. The size of each slab
for the bindless arrays associated with a material is specified via the
`#[bindless(N)]` attribute. For instance, consider the following
declaration:

```rust
#[derive(AsBindGroup)]
#[bindless(16)]
struct MyMaterial {
    #[buffer(0)]
    color: Vec4,
    #[texture(1)]
    #[sampler(2)]
    diffuse: Handle<Image>,
}
```

The `#[bindless(N)]` attribute specifies that, if bindless arrays are
supported on the current platform, each resource becomes a binding array
of N instances of that resource. So, for `MyMaterial` above, the `color`
attribute is exposed to the shader as `binding_array<vec4<f32>, 16>`,
the `diffuse` texture is exposed to the shader as
`binding_array<texture_2d<f32>, 16>`, and the `diffuse` sampler is
exposed to the shader as `binding_array<sampler, 16>`. Inside the
material's vertex and fragment shaders, the applicable index is
available via the `material_bind_group_slot` field of the `Mesh`
structure. So, for instance, you can access the current color like so:

```wgsl
// `uniform` binding arrays are a non-sequitur, so `uniform` is automatically promoted
// to `storage` in bindless mode.
@group(2) @binding(0) var<storage> material_color: binding_array<Color, 4>;
...
@fragment
fn fragment(in: VertexOutput) -> @location(0) vec4<f32> {
    let color = material_color[mesh[in.instance_index].material_bind_group_slot];
    ...
}
```

Note that portable shader code can't guarantee that the current platform
supports bindless textures. Indeed, bindless mode is only available in
Vulkan and DX12. The `BINDLESS` shader definition is available for your
use to determine whether you're on a bindless platform or not. Thus a
portable version of the shader above would look like:

```wgsl
#ifdef BINDLESS
@group(2) @binding(0) var<storage> material_color: binding_array<Color, 4>;
#else // BINDLESS
@group(2) @binding(0) var<uniform> material_color: Color;
#endif // BINDLESS
...
@fragment
fn fragment(in: VertexOutput) -> @location(0) vec4<f32> {
#ifdef BINDLESS
    let color = material_color[mesh[in.instance_index].material_bind_group_slot];
#else // BINDLESS
    let color = material_color;
#endif // BINDLESS
    ...
}
```

Importantly, this PR *doesn't* update `StandardMaterial` to be bindless.
So, for example, `scene_viewer` will currently not run any faster. I
intend to update `StandardMaterial` to use bindless mode in a follow-up
patch.

A new example, `shaders/shader_material_bindless`, has been added to
demonstrate how to use this new feature.

Here's a Tracy profile of `submit_graph_commands` of this patch and an
additional patch (not submitted yet) that makes `StandardMaterial` use
bindless. Red is those patches; yellow is `main`. The scene was Bistro
Exterior with a hack that forces all textures to opaque. You can see a
1.47x mean speedup.
![Screenshot 2024-11-12
161713](https://github.com/user-attachments/assets/4334b362-42c8-4d64-9cfb-6835f019b95c)

## Migration Guide

* `RenderAssets::prepare_asset` now takes an `AssetId` parameter.
* Bin keys now have Bevy-specific material bind group indices instead of
`wgpu` material bind group IDs, as part of the bindless change. Use the
new `MaterialBindGroupAllocator` to map from bind group index to bind
group ID.
2024-12-03 18:00:34 +00:00

451 lines
16 KiB
Rust

use crate::{
render_resource::AsBindGroupError, ExtractSchedule, MainWorld, Render, RenderApp, RenderSet,
};
use bevy_app::{App, Plugin, SubApp};
pub use bevy_asset::RenderAssetUsages;
use bevy_asset::{Asset, AssetEvent, AssetId, Assets};
use bevy_ecs::{
prelude::{Commands, EventReader, IntoSystemConfigs, ResMut, Resource},
schedule::{SystemConfigs, SystemSet},
system::{StaticSystemParam, SystemParam, SystemParamItem, SystemState},
world::{FromWorld, Mut},
};
use bevy_render_macros::ExtractResource;
use bevy_utils::{
tracing::{debug, error},
HashMap, HashSet,
};
use core::marker::PhantomData;
use derive_more::derive::{Display, Error};
#[derive(Debug, Error, Display)]
pub enum PrepareAssetError<E: Send + Sync + 'static> {
#[display("Failed to prepare asset")]
RetryNextUpdate(E),
#[display("Failed to build bind group: {_0}")]
AsBindGroupError(AsBindGroupError),
}
/// The system set during which we extract modified assets to the render world.
#[derive(SystemSet, Clone, PartialEq, Eq, Debug, Hash)]
pub struct ExtractAssetsSet;
/// Describes how an asset gets extracted and prepared for rendering.
///
/// In the [`ExtractSchedule`] step the [`RenderAsset::SourceAsset`] is transferred
/// from the "main world" into the "render world".
///
/// After that in the [`RenderSet::PrepareAssets`] step the extracted asset
/// is transformed into its GPU-representation of type [`RenderAsset`].
pub trait RenderAsset: Send + Sync + 'static + Sized {
/// The representation of the asset in the "main world".
type SourceAsset: Asset + Clone;
/// Specifies all ECS data required by [`RenderAsset::prepare_asset`].
///
/// For convenience use the [`lifetimeless`](bevy_ecs::system::lifetimeless) [`SystemParam`].
type Param: SystemParam;
/// Whether or not to unload the asset after extracting it to the render world.
#[inline]
fn asset_usage(_source_asset: &Self::SourceAsset) -> RenderAssetUsages {
RenderAssetUsages::default()
}
/// Size of the data the asset will upload to the gpu. Specifying a return value
/// will allow the asset to be throttled via [`RenderAssetBytesPerFrame`].
#[inline]
#[allow(unused_variables)]
fn byte_len(source_asset: &Self::SourceAsset) -> Option<usize> {
None
}
/// Prepares the [`RenderAsset::SourceAsset`] for the GPU by transforming it into a [`RenderAsset`].
///
/// ECS data may be accessed via `param`.
fn prepare_asset(
source_asset: Self::SourceAsset,
asset_id: AssetId<Self::SourceAsset>,
param: &mut SystemParamItem<Self::Param>,
) -> Result<Self, PrepareAssetError<Self::SourceAsset>>;
/// Called whenever the [`RenderAsset::SourceAsset`] has been removed.
///
/// You can implement this method if you need to access ECS data (via
/// `_param`) in order to perform cleanup tasks when the asset is removed.
///
/// The default implementation does nothing.
fn unload_asset(
_source_asset: AssetId<Self::SourceAsset>,
_param: &mut SystemParamItem<Self::Param>,
) {
}
}
/// This plugin extracts the changed assets from the "app world" into the "render world"
/// and prepares them for the GPU. They can then be accessed from the [`RenderAssets`] resource.
///
/// Therefore it sets up the [`ExtractSchedule`] and
/// [`RenderSet::PrepareAssets`] steps for the specified [`RenderAsset`].
///
/// The `AFTER` generic parameter can be used to specify that `A::prepare_asset` should not be run until
/// `prepare_assets::<AFTER>` has completed. This allows the `prepare_asset` function to depend on another
/// prepared [`RenderAsset`], for example `Mesh::prepare_asset` relies on `RenderAssets::<GpuImage>` for morph
/// targets, so the plugin is created as `RenderAssetPlugin::<RenderMesh, GpuImage>::default()`.
pub struct RenderAssetPlugin<A: RenderAsset, AFTER: RenderAssetDependency + 'static = ()> {
phantom: PhantomData<fn() -> (A, AFTER)>,
}
impl<A: RenderAsset, AFTER: RenderAssetDependency + 'static> Default
for RenderAssetPlugin<A, AFTER>
{
fn default() -> Self {
Self {
phantom: Default::default(),
}
}
}
impl<A: RenderAsset, AFTER: RenderAssetDependency + 'static> Plugin
for RenderAssetPlugin<A, AFTER>
{
fn build(&self, app: &mut App) {
app.init_resource::<CachedExtractRenderAssetSystemState<A>>();
if let Some(render_app) = app.get_sub_app_mut(RenderApp) {
render_app
.init_resource::<ExtractedAssets<A>>()
.init_resource::<RenderAssets<A>>()
.init_resource::<PrepareNextFrameAssets<A>>()
.add_systems(
ExtractSchedule,
extract_render_asset::<A>.in_set(ExtractAssetsSet),
);
AFTER::register_system(
render_app,
prepare_assets::<A>.in_set(RenderSet::PrepareAssets),
);
}
}
}
// helper to allow specifying dependencies between render assets
pub trait RenderAssetDependency {
fn register_system(render_app: &mut SubApp, system: SystemConfigs);
}
impl RenderAssetDependency for () {
fn register_system(render_app: &mut SubApp, system: SystemConfigs) {
render_app.add_systems(Render, system);
}
}
impl<A: RenderAsset> RenderAssetDependency for A {
fn register_system(render_app: &mut SubApp, system: SystemConfigs) {
render_app.add_systems(Render, system.after(prepare_assets::<A>));
}
}
/// Temporarily stores the extracted and removed assets of the current frame.
#[derive(Resource)]
pub struct ExtractedAssets<A: RenderAsset> {
/// The assets extracted this frame.
pub extracted: Vec<(AssetId<A::SourceAsset>, A::SourceAsset)>,
/// IDs of the assets removed this frame.
///
/// These assets will not be present in [`ExtractedAssets::extracted`].
pub removed: HashSet<AssetId<A::SourceAsset>>,
/// IDs of the assets added this frame.
pub added: HashSet<AssetId<A::SourceAsset>>,
}
impl<A: RenderAsset> Default for ExtractedAssets<A> {
fn default() -> Self {
Self {
extracted: Default::default(),
removed: Default::default(),
added: Default::default(),
}
}
}
/// Stores all GPU representations ([`RenderAsset`])
/// of [`RenderAsset::SourceAsset`] as long as they exist.
#[derive(Resource)]
pub struct RenderAssets<A: RenderAsset>(HashMap<AssetId<A::SourceAsset>, A>);
impl<A: RenderAsset> Default for RenderAssets<A> {
fn default() -> Self {
Self(Default::default())
}
}
impl<A: RenderAsset> RenderAssets<A> {
pub fn get(&self, id: impl Into<AssetId<A::SourceAsset>>) -> Option<&A> {
self.0.get(&id.into())
}
pub fn get_mut(&mut self, id: impl Into<AssetId<A::SourceAsset>>) -> Option<&mut A> {
self.0.get_mut(&id.into())
}
pub fn insert(&mut self, id: impl Into<AssetId<A::SourceAsset>>, value: A) -> Option<A> {
self.0.insert(id.into(), value)
}
pub fn remove(&mut self, id: impl Into<AssetId<A::SourceAsset>>) -> Option<A> {
self.0.remove(&id.into())
}
pub fn iter(&self) -> impl Iterator<Item = (AssetId<A::SourceAsset>, &A)> {
self.0.iter().map(|(k, v)| (*k, v))
}
pub fn iter_mut(&mut self) -> impl Iterator<Item = (AssetId<A::SourceAsset>, &mut A)> {
self.0.iter_mut().map(|(k, v)| (*k, v))
}
}
#[derive(Resource)]
struct CachedExtractRenderAssetSystemState<A: RenderAsset> {
state: SystemState<(
EventReader<'static, 'static, AssetEvent<A::SourceAsset>>,
ResMut<'static, Assets<A::SourceAsset>>,
)>,
}
impl<A: RenderAsset> FromWorld for CachedExtractRenderAssetSystemState<A> {
fn from_world(world: &mut bevy_ecs::world::World) -> Self {
Self {
state: SystemState::new(world),
}
}
}
/// This system extracts all created or modified assets of the corresponding [`RenderAsset::SourceAsset`] type
/// into the "render world".
pub(crate) fn extract_render_asset<A: RenderAsset>(
mut commands: Commands,
mut main_world: ResMut<MainWorld>,
) {
main_world.resource_scope(
|world, mut cached_state: Mut<CachedExtractRenderAssetSystemState<A>>| {
let (mut events, mut assets) = cached_state.state.get_mut(world);
let mut changed_assets = HashSet::default();
let mut removed = HashSet::default();
for event in events.read() {
#[allow(clippy::match_same_arms)]
match event {
AssetEvent::Added { id } | AssetEvent::Modified { id } => {
changed_assets.insert(*id);
}
AssetEvent::Removed { .. } => {}
AssetEvent::Unused { id } => {
changed_assets.remove(id);
removed.insert(*id);
}
AssetEvent::LoadedWithDependencies { .. } => {
// TODO: handle this
}
}
}
let mut extracted_assets = Vec::new();
let mut added = HashSet::new();
for id in changed_assets.drain() {
if let Some(asset) = assets.get(id) {
let asset_usage = A::asset_usage(asset);
if asset_usage.contains(RenderAssetUsages::RENDER_WORLD) {
if asset_usage == RenderAssetUsages::RENDER_WORLD {
if let Some(asset) = assets.remove(id) {
extracted_assets.push((id, asset));
added.insert(id);
}
} else {
extracted_assets.push((id, asset.clone()));
added.insert(id);
}
}
}
}
commands.insert_resource(ExtractedAssets::<A> {
extracted: extracted_assets,
removed,
added,
});
cached_state.state.apply(world);
},
);
}
// TODO: consider storing inside system?
/// All assets that should be prepared next frame.
#[derive(Resource)]
pub struct PrepareNextFrameAssets<A: RenderAsset> {
assets: Vec<(AssetId<A::SourceAsset>, A::SourceAsset)>,
}
impl<A: RenderAsset> Default for PrepareNextFrameAssets<A> {
fn default() -> Self {
Self {
assets: Default::default(),
}
}
}
/// This system prepares all assets of the corresponding [`RenderAsset::SourceAsset`] type
/// which where extracted this frame for the GPU.
pub fn prepare_assets<A: RenderAsset>(
mut extracted_assets: ResMut<ExtractedAssets<A>>,
mut render_assets: ResMut<RenderAssets<A>>,
mut prepare_next_frame: ResMut<PrepareNextFrameAssets<A>>,
param: StaticSystemParam<<A as RenderAsset>::Param>,
mut bpf: ResMut<RenderAssetBytesPerFrame>,
) {
let mut wrote_asset_count = 0;
let mut param = param.into_inner();
let queued_assets = core::mem::take(&mut prepare_next_frame.assets);
for (id, extracted_asset) in queued_assets {
if extracted_assets.removed.contains(&id) || extracted_assets.added.contains(&id) {
// skip previous frame's assets that have been removed or updated
continue;
}
let write_bytes = if let Some(size) = A::byte_len(&extracted_asset) {
// we could check if available bytes > byte_len here, but we want to make some
// forward progress even if the asset is larger than the max bytes per frame.
// this way we always write at least one (sized) asset per frame.
// in future we could also consider partial asset uploads.
if bpf.exhausted() {
prepare_next_frame.assets.push((id, extracted_asset));
continue;
}
size
} else {
0
};
match A::prepare_asset(extracted_asset, id, &mut param) {
Ok(prepared_asset) => {
render_assets.insert(id, prepared_asset);
bpf.write_bytes(write_bytes);
wrote_asset_count += 1;
}
Err(PrepareAssetError::RetryNextUpdate(extracted_asset)) => {
prepare_next_frame.assets.push((id, extracted_asset));
}
Err(PrepareAssetError::AsBindGroupError(e)) => {
error!(
"{} Bind group construction failed: {e}",
core::any::type_name::<A>()
);
}
}
}
for removed in extracted_assets.removed.drain() {
render_assets.remove(removed);
A::unload_asset(removed, &mut param);
}
for (id, extracted_asset) in extracted_assets.extracted.drain(..) {
// we remove previous here to ensure that if we are updating the asset then
// any users will not see the old asset after a new asset is extracted,
// even if the new asset is not yet ready or we are out of bytes to write.
render_assets.remove(id);
let write_bytes = if let Some(size) = A::byte_len(&extracted_asset) {
if bpf.exhausted() {
prepare_next_frame.assets.push((id, extracted_asset));
continue;
}
size
} else {
0
};
match A::prepare_asset(extracted_asset, id, &mut param) {
Ok(prepared_asset) => {
render_assets.insert(id, prepared_asset);
bpf.write_bytes(write_bytes);
wrote_asset_count += 1;
}
Err(PrepareAssetError::RetryNextUpdate(extracted_asset)) => {
prepare_next_frame.assets.push((id, extracted_asset));
}
Err(PrepareAssetError::AsBindGroupError(e)) => {
error!(
"{} Bind group construction failed: {e}",
core::any::type_name::<A>()
);
}
}
}
if bpf.exhausted() && !prepare_next_frame.assets.is_empty() {
debug!(
"{} write budget exhausted with {} assets remaining (wrote {})",
core::any::type_name::<A>(),
prepare_next_frame.assets.len(),
wrote_asset_count
);
}
}
/// A resource that attempts to limit the amount of data transferred from cpu to gpu
/// each frame, preventing choppy frames at the cost of waiting longer for gpu assets
/// to become available
#[derive(Resource, Default, Debug, Clone, Copy, ExtractResource)]
pub struct RenderAssetBytesPerFrame {
pub max_bytes: Option<usize>,
pub available: usize,
}
impl RenderAssetBytesPerFrame {
/// `max_bytes`: the number of bytes to write per frame.
/// this is a soft limit: only full assets are written currently, uploading stops
/// after the first asset that exceeds the limit.
/// To participate, assets should implement [`RenderAsset::byte_len`]. If the default
/// is not overridden, the assets are assumed to be small enough to upload without restriction.
pub fn new(max_bytes: usize) -> Self {
Self {
max_bytes: Some(max_bytes),
available: 0,
}
}
/// Reset the available bytes. Called once per frame by the [`crate::RenderPlugin`].
pub fn reset(&mut self) {
self.available = self.max_bytes.unwrap_or(usize::MAX);
}
/// check how many bytes are available since the last reset
pub fn available_bytes(&self, required_bytes: usize) -> usize {
if self.max_bytes.is_none() {
return required_bytes;
}
required_bytes.min(self.available)
}
/// decrease the available bytes for the current frame
fn write_bytes(&mut self, bytes: usize) {
if self.max_bytes.is_none() {
return;
}
let write_bytes = bytes.min(self.available);
self.available -= write_bytes;
}
// check if any bytes remain available for writing this frame
fn exhausted(&self) -> bool {
self.max_bytes.is_some() && self.available == 0
}
}