
Today, we sort all entities added to all phases, even the phases that don't strictly need sorting, such as the opaque and shadow phases. This results in a performance loss because our `PhaseItem`s are rather large in memory, so sorting is slow. Additionally, determining the boundaries of batches is an O(n) process. This commit makes Bevy instead applicable place phase items into *bins* keyed by *bin keys*, which have the invariant that everything in the same bin is potentially batchable. This makes determining batch boundaries O(1), because everything in the same bin can be batched. Instead of sorting each entity, we now sort only the bin keys. This drops the sorting time to near-zero on workloads with few bins like `many_cubes --no-frustum-culling`. Memory usage is improved too, with batch boundaries and dynamic indices now implicit instead of explicit. The improved memory usage results in a significant win even on unbatchable workloads like `many_cubes --no-frustum-culling --vary-material-data-per-instance`, presumably due to cache effects. Not all phases can be binned; some, such as transparent and transmissive phases, must still be sorted. To handle this, this commit splits `PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the logic that today deals with `PhaseItem`s has been moved to `SortedPhaseItem`. `BinnedPhaseItem` has the new logic. Frame time results (in ms/frame) are as follows: | Benchmark | `binning` | `main` | Speedup | | ------------------------ | --------- | ------- | ------- | | `many_cubes -nfc -vpi` | 232.179 | 312.123 | 34.43% | | `many_cubes -nfc` | 25.874 | 30.117 | 16.40% | | `many_foxes` | 3.276 | 3.515 | 7.30% | (`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for `--vary-per-instance`.) --- ## Changelog ### Changed * Render phases have been split into binned and sorted phases. Binned phases, such as the common opaque phase, achieve improved CPU performance by avoiding the sorting step. ## Migration Guide - `PhaseItem` has been split into `BinnedPhaseItem` and `SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need to migrate them to one of these two types. `SortedPhaseItem` requires the fewest code changes, but you may want to pick `BinnedPhaseItem` if your phase doesn't require sorting, as that enables higher performance. ## Tracy graphs `many-cubes --no-frustum-culling`, `main` branch: <img width="1064" alt="Screenshot 2024-03-12 180037" src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55"> `many-cubes --no-frustum-culling`, this branch: <img width="1064" alt="Screenshot 2024-03-12 180011" src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c"> You can see that `batch_and_prepare_binned_render_phase` is a much smaller fraction of the time. Zooming in on that function, with yellow being this branch and red being `main`, we see: <img width="1064" alt="Screenshot 2024-03-12 175832" src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0"> The binning happens in `queue_material_meshes`. Again with yellow being this branch and red being `main`: <img width="1064" alt="Screenshot 2024-03-12 175755" src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96"> We can see that there is a small regression in `queue_material_meshes` performance, but it's not nearly enough to outweigh the large gains in `batch_and_prepare_binned_render_phase`. --------- Co-authored-by: James Liu <contact@jamessliu.com>
277 lines
9.3 KiB
Rust
277 lines
9.3 KiB
Rust
//! A shader that renders a mesh multiple times in one draw call.
|
|
|
|
use bevy::{
|
|
core_pipeline::core_3d::Transparent3d,
|
|
ecs::{
|
|
query::QueryItem,
|
|
system::{lifetimeless::*, SystemParamItem},
|
|
},
|
|
pbr::{
|
|
MeshPipeline, MeshPipelineKey, RenderMeshInstances, SetMeshBindGroup, SetMeshViewBindGroup,
|
|
},
|
|
prelude::*,
|
|
render::{
|
|
extract_component::{ExtractComponent, ExtractComponentPlugin},
|
|
mesh::{GpuBufferInfo, MeshVertexBufferLayoutRef},
|
|
render_asset::RenderAssets,
|
|
render_phase::{
|
|
AddRenderCommand, DrawFunctions, PhaseItem, RenderCommand, RenderCommandResult,
|
|
SetItemPipeline, SortedRenderPhase, TrackedRenderPass,
|
|
},
|
|
render_resource::*,
|
|
renderer::RenderDevice,
|
|
view::{ExtractedView, NoFrustumCulling},
|
|
Render, RenderApp, RenderSet,
|
|
},
|
|
};
|
|
use bytemuck::{Pod, Zeroable};
|
|
|
|
fn main() {
|
|
App::new()
|
|
.add_plugins((DefaultPlugins, CustomMaterialPlugin))
|
|
.add_systems(Startup, setup)
|
|
.run();
|
|
}
|
|
|
|
fn setup(mut commands: Commands, mut meshes: ResMut<Assets<Mesh>>) {
|
|
commands.spawn((
|
|
meshes.add(Cuboid::new(0.5, 0.5, 0.5)),
|
|
SpatialBundle::INHERITED_IDENTITY,
|
|
InstanceMaterialData(
|
|
(1..=10)
|
|
.flat_map(|x| (1..=10).map(move |y| (x as f32 / 10.0, y as f32 / 10.0)))
|
|
.map(|(x, y)| InstanceData {
|
|
position: Vec3::new(x * 10.0 - 5.0, y * 10.0 - 5.0, 0.0),
|
|
scale: 1.0,
|
|
color: LinearRgba::from(Color::hsla(x * 360., y, 0.5, 1.0)).to_f32_array(),
|
|
})
|
|
.collect(),
|
|
),
|
|
// NOTE: Frustum culling is done based on the Aabb of the Mesh and the GlobalTransform.
|
|
// As the cube is at the origin, if its Aabb moves outside the view frustum, all the
|
|
// instanced cubes will be culled.
|
|
// The InstanceMaterialData contains the 'GlobalTransform' information for this custom
|
|
// instancing, and that is not taken into account with the built-in frustum culling.
|
|
// We must disable the built-in frustum culling by adding the `NoFrustumCulling` marker
|
|
// component to avoid incorrect culling.
|
|
NoFrustumCulling,
|
|
));
|
|
|
|
// camera
|
|
commands.spawn(Camera3dBundle {
|
|
transform: Transform::from_xyz(0.0, 0.0, 15.0).looking_at(Vec3::ZERO, Vec3::Y),
|
|
..default()
|
|
});
|
|
}
|
|
|
|
#[derive(Component, Deref)]
|
|
struct InstanceMaterialData(Vec<InstanceData>);
|
|
|
|
impl ExtractComponent for InstanceMaterialData {
|
|
type QueryData = &'static InstanceMaterialData;
|
|
type QueryFilter = ();
|
|
type Out = Self;
|
|
|
|
fn extract_component(item: QueryItem<'_, Self::QueryData>) -> Option<Self> {
|
|
Some(InstanceMaterialData(item.0.clone()))
|
|
}
|
|
}
|
|
|
|
struct CustomMaterialPlugin;
|
|
|
|
impl Plugin for CustomMaterialPlugin {
|
|
fn build(&self, app: &mut App) {
|
|
app.add_plugins(ExtractComponentPlugin::<InstanceMaterialData>::default());
|
|
app.sub_app_mut(RenderApp)
|
|
.add_render_command::<Transparent3d, DrawCustom>()
|
|
.init_resource::<SpecializedMeshPipelines<CustomPipeline>>()
|
|
.add_systems(
|
|
Render,
|
|
(
|
|
queue_custom.in_set(RenderSet::QueueMeshes),
|
|
prepare_instance_buffers.in_set(RenderSet::PrepareResources),
|
|
),
|
|
);
|
|
}
|
|
|
|
fn finish(&self, app: &mut App) {
|
|
app.sub_app_mut(RenderApp).init_resource::<CustomPipeline>();
|
|
}
|
|
}
|
|
|
|
#[derive(Clone, Copy, Pod, Zeroable)]
|
|
#[repr(C)]
|
|
struct InstanceData {
|
|
position: Vec3,
|
|
scale: f32,
|
|
color: [f32; 4],
|
|
}
|
|
|
|
#[allow(clippy::too_many_arguments)]
|
|
fn queue_custom(
|
|
transparent_3d_draw_functions: Res<DrawFunctions<Transparent3d>>,
|
|
custom_pipeline: Res<CustomPipeline>,
|
|
msaa: Res<Msaa>,
|
|
mut pipelines: ResMut<SpecializedMeshPipelines<CustomPipeline>>,
|
|
pipeline_cache: Res<PipelineCache>,
|
|
meshes: Res<RenderAssets<Mesh>>,
|
|
render_mesh_instances: Res<RenderMeshInstances>,
|
|
material_meshes: Query<Entity, With<InstanceMaterialData>>,
|
|
mut views: Query<(&ExtractedView, &mut SortedRenderPhase<Transparent3d>)>,
|
|
) {
|
|
let draw_custom = transparent_3d_draw_functions.read().id::<DrawCustom>();
|
|
|
|
let msaa_key = MeshPipelineKey::from_msaa_samples(msaa.samples());
|
|
|
|
for (view, mut transparent_phase) in &mut views {
|
|
let view_key = msaa_key | MeshPipelineKey::from_hdr(view.hdr);
|
|
let rangefinder = view.rangefinder3d();
|
|
for entity in &material_meshes {
|
|
let Some(mesh_instance) = render_mesh_instances.get(&entity) else {
|
|
continue;
|
|
};
|
|
let Some(mesh) = meshes.get(mesh_instance.mesh_asset_id) else {
|
|
continue;
|
|
};
|
|
let key = view_key | MeshPipelineKey::from_primitive_topology(mesh.primitive_topology);
|
|
let pipeline = pipelines
|
|
.specialize(&pipeline_cache, &custom_pipeline, key, &mesh.layout)
|
|
.unwrap();
|
|
transparent_phase.add(Transparent3d {
|
|
entity,
|
|
pipeline,
|
|
draw_function: draw_custom,
|
|
distance: rangefinder
|
|
.distance_translation(&mesh_instance.transforms.transform.translation),
|
|
batch_range: 0..1,
|
|
dynamic_offset: None,
|
|
});
|
|
}
|
|
}
|
|
}
|
|
|
|
#[derive(Component)]
|
|
struct InstanceBuffer {
|
|
buffer: Buffer,
|
|
length: usize,
|
|
}
|
|
|
|
fn prepare_instance_buffers(
|
|
mut commands: Commands,
|
|
query: Query<(Entity, &InstanceMaterialData)>,
|
|
render_device: Res<RenderDevice>,
|
|
) {
|
|
for (entity, instance_data) in &query {
|
|
let buffer = render_device.create_buffer_with_data(&BufferInitDescriptor {
|
|
label: Some("instance data buffer"),
|
|
contents: bytemuck::cast_slice(instance_data.as_slice()),
|
|
usage: BufferUsages::VERTEX | BufferUsages::COPY_DST,
|
|
});
|
|
commands.entity(entity).insert(InstanceBuffer {
|
|
buffer,
|
|
length: instance_data.len(),
|
|
});
|
|
}
|
|
}
|
|
|
|
#[derive(Resource)]
|
|
struct CustomPipeline {
|
|
shader: Handle<Shader>,
|
|
mesh_pipeline: MeshPipeline,
|
|
}
|
|
|
|
impl FromWorld for CustomPipeline {
|
|
fn from_world(world: &mut World) -> Self {
|
|
let mesh_pipeline = world.resource::<MeshPipeline>();
|
|
|
|
CustomPipeline {
|
|
shader: world.load_asset("shaders/instancing.wgsl"),
|
|
mesh_pipeline: mesh_pipeline.clone(),
|
|
}
|
|
}
|
|
}
|
|
|
|
impl SpecializedMeshPipeline for CustomPipeline {
|
|
type Key = MeshPipelineKey;
|
|
|
|
fn specialize(
|
|
&self,
|
|
key: Self::Key,
|
|
layout: &MeshVertexBufferLayoutRef,
|
|
) -> Result<RenderPipelineDescriptor, SpecializedMeshPipelineError> {
|
|
let mut descriptor = self.mesh_pipeline.specialize(key, layout)?;
|
|
|
|
descriptor.vertex.shader = self.shader.clone();
|
|
descriptor.vertex.buffers.push(VertexBufferLayout {
|
|
array_stride: std::mem::size_of::<InstanceData>() as u64,
|
|
step_mode: VertexStepMode::Instance,
|
|
attributes: vec![
|
|
VertexAttribute {
|
|
format: VertexFormat::Float32x4,
|
|
offset: 0,
|
|
shader_location: 3, // shader locations 0-2 are taken up by Position, Normal and UV attributes
|
|
},
|
|
VertexAttribute {
|
|
format: VertexFormat::Float32x4,
|
|
offset: VertexFormat::Float32x4.size(),
|
|
shader_location: 4,
|
|
},
|
|
],
|
|
});
|
|
descriptor.fragment.as_mut().unwrap().shader = self.shader.clone();
|
|
Ok(descriptor)
|
|
}
|
|
}
|
|
|
|
type DrawCustom = (
|
|
SetItemPipeline,
|
|
SetMeshViewBindGroup<0>,
|
|
SetMeshBindGroup<1>,
|
|
DrawMeshInstanced,
|
|
);
|
|
|
|
struct DrawMeshInstanced;
|
|
|
|
impl<P: PhaseItem> RenderCommand<P> for DrawMeshInstanced {
|
|
type Param = (SRes<RenderAssets<Mesh>>, SRes<RenderMeshInstances>);
|
|
type ViewQuery = ();
|
|
type ItemQuery = Read<InstanceBuffer>;
|
|
|
|
#[inline]
|
|
fn render<'w>(
|
|
item: &P,
|
|
_view: (),
|
|
instance_buffer: Option<&'w InstanceBuffer>,
|
|
(meshes, render_mesh_instances): SystemParamItem<'w, '_, Self::Param>,
|
|
pass: &mut TrackedRenderPass<'w>,
|
|
) -> RenderCommandResult {
|
|
let Some(mesh_instance) = render_mesh_instances.get(&item.entity()) else {
|
|
return RenderCommandResult::Failure;
|
|
};
|
|
let Some(gpu_mesh) = meshes.into_inner().get(mesh_instance.mesh_asset_id) else {
|
|
return RenderCommandResult::Failure;
|
|
};
|
|
let Some(instance_buffer) = instance_buffer else {
|
|
return RenderCommandResult::Failure;
|
|
};
|
|
|
|
pass.set_vertex_buffer(0, gpu_mesh.vertex_buffer.slice(..));
|
|
pass.set_vertex_buffer(1, instance_buffer.buffer.slice(..));
|
|
|
|
match &gpu_mesh.buffer_info {
|
|
GpuBufferInfo::Indexed {
|
|
buffer,
|
|
index_format,
|
|
count,
|
|
} => {
|
|
pass.set_index_buffer(buffer.slice(..), 0, *index_format);
|
|
pass.draw_indexed(0..*count, 0, 0..instance_buffer.length as u32);
|
|
}
|
|
GpuBufferInfo::NonIndexed => {
|
|
pass.draw(0..gpu_mesh.vertex_count, 0..instance_buffer.length as u32);
|
|
}
|
|
}
|
|
RenderCommandResult::Success
|
|
}
|
|
}
|