bevy/examples/shader/custom_shader_instancing.rs
Chris Russell f7e112a3c9
Let query items borrow from query state to avoid needing to clone (#15396)
# Objective

Improve the performance of `FilteredEntity(Ref|Mut)` and
`Entity(Ref|Mut)Except`.

`FilteredEntityRef` needs an `Access<ComponentId>` to determine what
components it can access. There is one stored in the query state, but
query items cannot borrow from the state, so it has to `clone()` the
access for each row. Cloning the access involves memory allocations and
can be expensive.


## Solution

Let query items borrow from their query state.  

Add an `'s` lifetime to `WorldQuery::Item` and `WorldQuery::Fetch`,
similar to the one in `SystemParam`, and provide `&'s Self::State` to
the fetch so that it can borrow from the state.

Unfortunately, there are a few cases where we currently return query
items from temporary query states: the sorted iteration methods create a
temporary state to query the sort keys, and the
`EntityRef::components<Q>()` methods create a temporary state for their
query.

To allow these to continue to work with most `QueryData`
implementations, introduce a new subtrait `ReleaseStateQueryData` that
converts a `QueryItem<'w, 's>` to `QueryItem<'w, 'static>`, and is
implemented for everything except `FilteredEntity(Ref|Mut)` and
`Entity(Ref|Mut)Except`.

`#[derive(QueryData)]` will generate `ReleaseStateQueryData`
implementations that apply when all of the subqueries implement
`ReleaseStateQueryData`.

This PR does not actually change the implementation of
`FilteredEntity(Ref|Mut)` or `Entity(Ref|Mut)Except`! That will be done
as a follow-up PR so that the changes are easier to review. I have
pushed the changes as chescock/bevy#5.

## Testing

I ran performance traces of many_foxes, both against main and against
chescock/bevy#5, both including #15282. These changes do appear to make
generalized animation a bit faster:

(Red is main, yellow is chescock/bevy#5)

![image](https://github.com/user-attachments/assets/de900117-0c6a-431d-ab62-c013834f97a9)


## Migration Guide

The `WorldQuery::Item` and `WorldQuery::Fetch` associated types and the
`QueryItem` and `ROQueryItem` type aliases now have an additional
lifetime parameter corresponding to the `'s` lifetime in `Query`. Manual
implementations of `WorldQuery` will need to update the method
signatures to include the new lifetimes. Other uses of the types will
need to be updated to include a lifetime parameter, although it can
usually be passed as `'_`. In particular, `ROQueryItem` is used when
implementing `RenderCommand`.

Before: 

```rust
fn render<'w>(
    item: &P,
    view: ROQueryItem<'w, Self::ViewQuery>,
    entity: Option<ROQueryItem<'w, Self::ItemQuery>>,
    param: SystemParamItem<'w, '_, Self::Param>,
    pass: &mut TrackedRenderPass<'w>,
) -> RenderCommandResult;
```

After: 

```rust
fn render<'w>(
    item: &P,
    view: ROQueryItem<'w, '_, Self::ViewQuery>,
    entity: Option<ROQueryItem<'w, '_, Self::ItemQuery>>,
    param: SystemParamItem<'w, '_, Self::Param>,
    pass: &mut TrackedRenderPass<'w>,
) -> RenderCommandResult;
```

---

Methods on `QueryState` that take `&mut self` may now result in
conflicting borrows if the query items capture the lifetime of the
mutable reference. This affects `get()`, `iter()`, and others. To fix
the errors, first call `QueryState::update_archetypes()`, and then
replace a call `state.foo(world, param)` with
`state.query_manual(world).foo_inner(param)`. Alternately, you may be
able to restructure the code to call `state.query(world)` once and then
make multiple calls using the `Query`.

Before:
```rust
let mut state: QueryState<_, _> = ...;
let d1 = state.get(world, e1);
let d2 = state.get(world, e2); // Error: cannot borrow `state` as mutable more than once at a time
println!("{d1:?}");
println!("{d2:?}");
```

After: 
```rust
let mut state: QueryState<_, _> = ...;

state.update_archetypes(world);
let d1 = state.get_manual(world, e1);
let d2 = state.get_manual(world, e2);
// OR
state.update_archetypes(world);
let d1 = state.query(world).get_inner(e1);
let d2 = state.query(world).get_inner(e2);
// OR
let query = state.query(world);
let d1 = query.get_inner(e1);
let d1 = query.get_inner(e2);

println!("{d1:?}");
println!("{d2:?}");
```
2025-06-16 21:05:41 +00:00

321 lines
11 KiB
Rust

//! A shader that renders a mesh multiple times in one draw call.
//!
//! Bevy will automatically batch and instance your meshes assuming you use the same
//! `Handle<Material>` and `Handle<Mesh>` for all of your instances.
//!
//! This example is intended for advanced users and shows how to make a custom instancing
//! implementation using bevy's low level rendering api.
//! It's generally recommended to try the built-in instancing before going with this approach.
use bevy::{
core_pipeline::core_3d::Transparent3d,
ecs::{
query::QueryItem,
system::{lifetimeless::*, SystemParamItem},
},
pbr::{
MeshPipeline, MeshPipelineKey, RenderMeshInstances, SetMeshBindGroup, SetMeshViewBindGroup,
},
prelude::*,
render::{
extract_component::{ExtractComponent, ExtractComponentPlugin},
mesh::{
allocator::MeshAllocator, MeshVertexBufferLayoutRef, RenderMesh, RenderMeshBufferInfo,
},
render_asset::RenderAssets,
render_phase::{
AddRenderCommand, DrawFunctions, PhaseItem, PhaseItemExtraIndex, RenderCommand,
RenderCommandResult, SetItemPipeline, TrackedRenderPass, ViewSortedRenderPhases,
},
render_resource::*,
renderer::RenderDevice,
sync_world::MainEntity,
view::{ExtractedView, NoFrustumCulling, NoIndirectDrawing},
Render, RenderApp, RenderSystems,
},
};
use bytemuck::{Pod, Zeroable};
/// This example uses a shader source file from the assets subdirectory
const SHADER_ASSET_PATH: &str = "shaders/instancing.wgsl";
fn main() {
App::new()
.add_plugins((DefaultPlugins, CustomMaterialPlugin))
.add_systems(Startup, setup)
.run();
}
fn setup(mut commands: Commands, mut meshes: ResMut<Assets<Mesh>>) {
commands.spawn((
Mesh3d(meshes.add(Cuboid::new(0.5, 0.5, 0.5))),
InstanceMaterialData(
(1..=10)
.flat_map(|x| (1..=10).map(move |y| (x as f32 / 10.0, y as f32 / 10.0)))
.map(|(x, y)| InstanceData {
position: Vec3::new(x * 10.0 - 5.0, y * 10.0 - 5.0, 0.0),
scale: 1.0,
color: LinearRgba::from(Color::hsla(x * 360., y, 0.5, 1.0)).to_f32_array(),
})
.collect(),
),
// NOTE: Frustum culling is done based on the Aabb of the Mesh and the GlobalTransform.
// As the cube is at the origin, if its Aabb moves outside the view frustum, all the
// instanced cubes will be culled.
// The InstanceMaterialData contains the 'GlobalTransform' information for this custom
// instancing, and that is not taken into account with the built-in frustum culling.
// We must disable the built-in frustum culling by adding the `NoFrustumCulling` marker
// component to avoid incorrect culling.
NoFrustumCulling,
));
// camera
commands.spawn((
Camera3d::default(),
Transform::from_xyz(0.0, 0.0, 15.0).looking_at(Vec3::ZERO, Vec3::Y),
// We need this component because we use `draw_indexed` and `draw`
// instead of `draw_indirect_indexed` and `draw_indirect` in
// `DrawMeshInstanced::render`.
NoIndirectDrawing,
));
}
#[derive(Component, Deref)]
struct InstanceMaterialData(Vec<InstanceData>);
impl ExtractComponent for InstanceMaterialData {
type QueryData = &'static InstanceMaterialData;
type QueryFilter = ();
type Out = Self;
fn extract_component(item: QueryItem<'_, '_, Self::QueryData>) -> Option<Self> {
Some(InstanceMaterialData(item.0.clone()))
}
}
struct CustomMaterialPlugin;
impl Plugin for CustomMaterialPlugin {
fn build(&self, app: &mut App) {
app.add_plugins(ExtractComponentPlugin::<InstanceMaterialData>::default());
app.sub_app_mut(RenderApp)
.add_render_command::<Transparent3d, DrawCustom>()
.init_resource::<SpecializedMeshPipelines<CustomPipeline>>()
.add_systems(
Render,
(
queue_custom.in_set(RenderSystems::QueueMeshes),
prepare_instance_buffers.in_set(RenderSystems::PrepareResources),
),
);
}
fn finish(&self, app: &mut App) {
app.sub_app_mut(RenderApp).init_resource::<CustomPipeline>();
}
}
#[derive(Clone, Copy, Pod, Zeroable)]
#[repr(C)]
struct InstanceData {
position: Vec3,
scale: f32,
color: [f32; 4],
}
fn queue_custom(
transparent_3d_draw_functions: Res<DrawFunctions<Transparent3d>>,
custom_pipeline: Res<CustomPipeline>,
mut pipelines: ResMut<SpecializedMeshPipelines<CustomPipeline>>,
pipeline_cache: Res<PipelineCache>,
meshes: Res<RenderAssets<RenderMesh>>,
render_mesh_instances: Res<RenderMeshInstances>,
material_meshes: Query<(Entity, &MainEntity), With<InstanceMaterialData>>,
mut transparent_render_phases: ResMut<ViewSortedRenderPhases<Transparent3d>>,
views: Query<(&ExtractedView, &Msaa)>,
) {
let draw_custom = transparent_3d_draw_functions.read().id::<DrawCustom>();
for (view, msaa) in &views {
let Some(transparent_phase) = transparent_render_phases.get_mut(&view.retained_view_entity)
else {
continue;
};
let msaa_key = MeshPipelineKey::from_msaa_samples(msaa.samples());
let view_key = msaa_key | MeshPipelineKey::from_hdr(view.hdr);
let rangefinder = view.rangefinder3d();
for (entity, main_entity) in &material_meshes {
let Some(mesh_instance) = render_mesh_instances.render_mesh_queue_data(*main_entity)
else {
continue;
};
let Some(mesh) = meshes.get(mesh_instance.mesh_asset_id) else {
continue;
};
let key =
view_key | MeshPipelineKey::from_primitive_topology(mesh.primitive_topology());
let pipeline = pipelines
.specialize(&pipeline_cache, &custom_pipeline, key, &mesh.layout)
.unwrap();
transparent_phase.add(Transparent3d {
entity: (entity, *main_entity),
pipeline,
draw_function: draw_custom,
distance: rangefinder.distance_translation(&mesh_instance.translation),
batch_range: 0..1,
extra_index: PhaseItemExtraIndex::None,
indexed: true,
});
}
}
}
#[derive(Component)]
struct InstanceBuffer {
buffer: Buffer,
length: usize,
}
fn prepare_instance_buffers(
mut commands: Commands,
query: Query<(Entity, &InstanceMaterialData)>,
render_device: Res<RenderDevice>,
) {
for (entity, instance_data) in &query {
let buffer = render_device.create_buffer_with_data(&BufferInitDescriptor {
label: Some("instance data buffer"),
contents: bytemuck::cast_slice(instance_data.as_slice()),
usage: BufferUsages::VERTEX | BufferUsages::COPY_DST,
});
commands.entity(entity).insert(InstanceBuffer {
buffer,
length: instance_data.len(),
});
}
}
#[derive(Resource)]
struct CustomPipeline {
shader: Handle<Shader>,
mesh_pipeline: MeshPipeline,
}
impl FromWorld for CustomPipeline {
fn from_world(world: &mut World) -> Self {
let mesh_pipeline = world.resource::<MeshPipeline>();
CustomPipeline {
shader: world.load_asset(SHADER_ASSET_PATH),
mesh_pipeline: mesh_pipeline.clone(),
}
}
}
impl SpecializedMeshPipeline for CustomPipeline {
type Key = MeshPipelineKey;
fn specialize(
&self,
key: Self::Key,
layout: &MeshVertexBufferLayoutRef,
) -> Result<RenderPipelineDescriptor, SpecializedMeshPipelineError> {
let mut descriptor = self.mesh_pipeline.specialize(key, layout)?;
descriptor.vertex.shader = self.shader.clone();
descriptor.vertex.buffers.push(VertexBufferLayout {
array_stride: size_of::<InstanceData>() as u64,
step_mode: VertexStepMode::Instance,
attributes: vec![
VertexAttribute {
format: VertexFormat::Float32x4,
offset: 0,
shader_location: 3, // shader locations 0-2 are taken up by Position, Normal and UV attributes
},
VertexAttribute {
format: VertexFormat::Float32x4,
offset: VertexFormat::Float32x4.size(),
shader_location: 4,
},
],
});
descriptor.fragment.as_mut().unwrap().shader = self.shader.clone();
Ok(descriptor)
}
}
type DrawCustom = (
SetItemPipeline,
SetMeshViewBindGroup<0>,
SetMeshBindGroup<1>,
DrawMeshInstanced,
);
struct DrawMeshInstanced;
impl<P: PhaseItem> RenderCommand<P> for DrawMeshInstanced {
type Param = (
SRes<RenderAssets<RenderMesh>>,
SRes<RenderMeshInstances>,
SRes<MeshAllocator>,
);
type ViewQuery = ();
type ItemQuery = Read<InstanceBuffer>;
#[inline]
fn render<'w>(
item: &P,
_view: (),
instance_buffer: Option<&'w InstanceBuffer>,
(meshes, render_mesh_instances, mesh_allocator): SystemParamItem<'w, '_, Self::Param>,
pass: &mut TrackedRenderPass<'w>,
) -> RenderCommandResult {
// A borrow check workaround.
let mesh_allocator = mesh_allocator.into_inner();
let Some(mesh_instance) = render_mesh_instances.render_mesh_queue_data(item.main_entity())
else {
return RenderCommandResult::Skip;
};
let Some(gpu_mesh) = meshes.into_inner().get(mesh_instance.mesh_asset_id) else {
return RenderCommandResult::Skip;
};
let Some(instance_buffer) = instance_buffer else {
return RenderCommandResult::Skip;
};
let Some(vertex_buffer_slice) =
mesh_allocator.mesh_vertex_slice(&mesh_instance.mesh_asset_id)
else {
return RenderCommandResult::Skip;
};
pass.set_vertex_buffer(0, vertex_buffer_slice.buffer.slice(..));
pass.set_vertex_buffer(1, instance_buffer.buffer.slice(..));
match &gpu_mesh.buffer_info {
RenderMeshBufferInfo::Indexed {
index_format,
count,
} => {
let Some(index_buffer_slice) =
mesh_allocator.mesh_index_slice(&mesh_instance.mesh_asset_id)
else {
return RenderCommandResult::Skip;
};
pass.set_index_buffer(index_buffer_slice.buffer.slice(..), 0, *index_format);
pass.draw_indexed(
index_buffer_slice.range.start..(index_buffer_slice.range.start + count),
vertex_buffer_slice.range.start as i32,
0..instance_buffer.length as u32,
);
}
RenderMeshBufferInfo::NonIndexed => {
pass.draw(vertex_buffer_slice.range, 0..instance_buffer.length as u32);
}
}
RenderCommandResult::Success
}
}