
# Objective - Faster meshlet rasterization path for small triangles - Avoid having to allocate and write out a triangle buffer - Refactor gpu_scene.rs ## Solution - Replace the 32bit visbuffer texture with a 64bit visbuffer buffer, where the left 32 bits encode depth, and the right 32 bits encode the existing cluster + triangle IDs. Can't use 64bit textures, wgpu/naga doesn't support atomic ops on textures yet. - Instead of writing out a buffer of packed cluster + triangle IDs (per triangle) to raster, the culling pass now writes out a buffer of just cluster IDs (per cluster, so less memory allocated, cheaper to write out). - Clusters for software raster are allocated from the left side - Clusters for hardware raster are allocated in the same buffer, from the right side - The buffer size is fixed at MeshletPlugin build time, and should be set to a reasonable value for your scene (no warning on overflow, and no good way to determine what value you need outside of renderdoc - I plan to fix this in a future PR adding a meshlet stats overlay) - Currently I don't have a heuristic for software vs hardware raster selection for each cluster. The existing code is just a placeholder. I need to profile on a release scene and come up with a heuristic, probably in a future PR. - The culling shader is getting pretty hard to follow at this point, but I don't want to spend time improving it as the entire shader/pass is getting rewritten/replaced in the near future. - Software raster is a compute workgroup per-cluster. Each workgroup loads and transforms the <=64 vertices of the cluster, and then rasterizes the <=64 triangles of the cluster. - Two variants are implemented: Scanline for clusters with any larger triangles (still smaller than hardware is good at), and brute-force for very very tiny triangles - Once the shader determines that a pixel should be filled in, it does an atomicMax() on the visbuffer to store the results, copying how Nanite works - On devices with a low max workgroups per dispatch limit, an extra compute pass is inserted before software raster to convert from a 1d to 2d dispatch (I don't think 3d would ever be necessary). - I haven't implemented the top-left rule or subpixel precision yet, I'm leaving that for a future PR since I get usable results without it for now - Resources used: https://kristoffer-dyrkorn.github.io/triangle-rasterizer and chapters 6-8 of https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index - Hardware raster now spawns 64*3 vertex invocations per meshlet, instead of the actual meshlet vertex count. Extra invocations just early-exit. - While this is slower than the existing system, hardware draws should be rare now that software raster is usable, and it saves a ton of memory using the unified cluster ID buffer. This would be fixed if wgpu had support for mesh shaders. - Instead of writing to a color+depth attachment, the hardware raster pass also does the same atomic visbuffer writes that software raster uses. - We have to bind a dummy render target anyways, as wgpu doesn't currently support render passes without any attachments - Material IDs are no longer written out during the main rasterization passes. - If we had async compute queues, we could overlap the software and hardware raster passes. - New material and depth resolve passes run at the end of the visbuffer node, and write out view depth and material ID depth textures ### Misc changes - Fixed cluster culling importing, but never actually using the previous view uniforms when doing occlusion culling - Fixed incorrectly adding the LOD error twice when building the meshlet mesh - Splitup gpu_scene module into meshlet_mesh_manager, instance_manager, and resource_manager - resource_manager is still too complex and inefficient (extract and prepare are way too expensive). I plan on improving this in a future PR, but for now ResourceManager is mostly a 1:1 port of the leftover MeshletGpuScene bits. - Material draw passes have been renamed to the more accurate material shade pass, as well as some other misc renaming (in the future, these will be compute shaders even, and not actual draw calls) --- ## Migration Guide - TBD (ask me at the end of the release for meshlet changes as a whole) --------- Co-authored-by: vero <email@atlasdostal.com>
262 lines
9.5 KiB
Rust
262 lines
9.5 KiB
Rust
use super::{meshlet_mesh_manager::MeshletMeshManager, MeshletMesh};
|
|
use crate::{
|
|
Material, MeshFlags, MeshTransforms, MeshUniform, NotShadowCaster, NotShadowReceiver,
|
|
PreviousGlobalTransform, RenderMaterialInstances,
|
|
};
|
|
use bevy_asset::{AssetEvent, AssetServer, Assets, Handle, UntypedAssetId};
|
|
use bevy_ecs::{
|
|
entity::{Entities, Entity, EntityHashMap},
|
|
event::EventReader,
|
|
query::Has,
|
|
system::{Local, Query, Res, ResMut, Resource, SystemState},
|
|
};
|
|
use bevy_render::{render_resource::StorageBuffer, view::RenderLayers, MainWorld};
|
|
use bevy_transform::components::GlobalTransform;
|
|
use bevy_utils::{HashMap, HashSet};
|
|
use std::ops::{DerefMut, Range};
|
|
|
|
/// Manages data for each entity with a [`MeshletMesh`].
|
|
#[derive(Resource)]
|
|
pub struct InstanceManager {
|
|
/// Amount of clusters in the scene (sum of all meshlet counts across all instances)
|
|
pub scene_cluster_count: u32,
|
|
|
|
/// Per-instance [`Entity`], [`RenderLayers`], and [`NotShadowCaster`]
|
|
pub instances: Vec<(Entity, RenderLayers, bool)>,
|
|
/// Per-instance [`MeshUniform`]
|
|
pub instance_uniforms: StorageBuffer<Vec<MeshUniform>>,
|
|
/// Per-instance material ID
|
|
pub instance_material_ids: StorageBuffer<Vec<u32>>,
|
|
/// Prefix-sum of meshlet counts per instance
|
|
pub instance_meshlet_counts_prefix_sum: StorageBuffer<Vec<u32>>,
|
|
/// Per-instance index to the start of the instance's slice of the meshlets buffer
|
|
pub instance_meshlet_slice_starts: StorageBuffer<Vec<u32>>,
|
|
/// Per-view per-instance visibility bit. Used for [`RenderLayers`] and [`NotShadowCaster`] support.
|
|
pub view_instance_visibility: EntityHashMap<StorageBuffer<Vec<u32>>>,
|
|
|
|
/// Next material ID available for a [`Material`]
|
|
next_material_id: u32,
|
|
/// Map of [`Material`] to material ID
|
|
material_id_lookup: HashMap<UntypedAssetId, u32>,
|
|
/// Set of material IDs used in the scene
|
|
material_ids_present_in_scene: HashSet<u32>,
|
|
}
|
|
|
|
impl InstanceManager {
|
|
pub fn new() -> Self {
|
|
Self {
|
|
scene_cluster_count: 0,
|
|
|
|
instances: Vec::new(),
|
|
instance_uniforms: {
|
|
let mut buffer = StorageBuffer::default();
|
|
buffer.set_label(Some("meshlet_instance_uniforms"));
|
|
buffer
|
|
},
|
|
instance_material_ids: {
|
|
let mut buffer = StorageBuffer::default();
|
|
buffer.set_label(Some("meshlet_instance_material_ids"));
|
|
buffer
|
|
},
|
|
instance_meshlet_counts_prefix_sum: {
|
|
let mut buffer = StorageBuffer::default();
|
|
buffer.set_label(Some("meshlet_instance_meshlet_counts_prefix_sum"));
|
|
buffer
|
|
},
|
|
instance_meshlet_slice_starts: {
|
|
let mut buffer = StorageBuffer::default();
|
|
buffer.set_label(Some("meshlet_instance_meshlet_slice_starts"));
|
|
buffer
|
|
},
|
|
view_instance_visibility: EntityHashMap::default(),
|
|
|
|
next_material_id: 0,
|
|
material_id_lookup: HashMap::new(),
|
|
material_ids_present_in_scene: HashSet::new(),
|
|
}
|
|
}
|
|
|
|
#[allow(clippy::too_many_arguments)]
|
|
pub fn add_instance(
|
|
&mut self,
|
|
instance: Entity,
|
|
meshlets_slice: Range<u32>,
|
|
transform: &GlobalTransform,
|
|
previous_transform: Option<&PreviousGlobalTransform>,
|
|
render_layers: Option<&RenderLayers>,
|
|
not_shadow_receiver: bool,
|
|
not_shadow_caster: bool,
|
|
) {
|
|
// Build a MeshUniform for the instance
|
|
let transform = transform.affine();
|
|
let previous_transform = previous_transform.map(|t| t.0).unwrap_or(transform);
|
|
let mut flags = if not_shadow_receiver {
|
|
MeshFlags::empty()
|
|
} else {
|
|
MeshFlags::SHADOW_RECEIVER
|
|
};
|
|
if transform.matrix3.determinant().is_sign_positive() {
|
|
flags |= MeshFlags::SIGN_DETERMINANT_MODEL_3X3;
|
|
}
|
|
let transforms = MeshTransforms {
|
|
world_from_local: (&transform).into(),
|
|
previous_world_from_local: (&previous_transform).into(),
|
|
flags: flags.bits(),
|
|
};
|
|
let mesh_uniform = MeshUniform::new(&transforms, 0, None);
|
|
|
|
// Append instance data
|
|
self.instances.push((
|
|
instance,
|
|
render_layers.cloned().unwrap_or(RenderLayers::default()),
|
|
not_shadow_caster,
|
|
));
|
|
self.instance_uniforms.get_mut().push(mesh_uniform);
|
|
self.instance_material_ids.get_mut().push(0);
|
|
self.instance_meshlet_counts_prefix_sum
|
|
.get_mut()
|
|
.push(self.scene_cluster_count);
|
|
self.instance_meshlet_slice_starts
|
|
.get_mut()
|
|
.push(meshlets_slice.start);
|
|
|
|
self.scene_cluster_count += meshlets_slice.end - meshlets_slice.start;
|
|
}
|
|
|
|
/// Get the material ID for a [`crate::Material`].
|
|
pub fn get_material_id(&mut self, material_asset_id: UntypedAssetId) -> u32 {
|
|
*self
|
|
.material_id_lookup
|
|
.entry(material_asset_id)
|
|
.or_insert_with(|| {
|
|
self.next_material_id += 1;
|
|
self.next_material_id
|
|
})
|
|
}
|
|
|
|
pub fn material_present_in_scene(&self, material_id: &u32) -> bool {
|
|
self.material_ids_present_in_scene.contains(material_id)
|
|
}
|
|
|
|
pub fn reset(&mut self, entities: &Entities) {
|
|
self.scene_cluster_count = 0;
|
|
|
|
self.instances.clear();
|
|
self.instance_uniforms.get_mut().clear();
|
|
self.instance_material_ids.get_mut().clear();
|
|
self.instance_meshlet_counts_prefix_sum.get_mut().clear();
|
|
self.instance_meshlet_slice_starts.get_mut().clear();
|
|
self.view_instance_visibility
|
|
.retain(|view_entity, _| entities.contains(*view_entity));
|
|
self.view_instance_visibility
|
|
.values_mut()
|
|
.for_each(|b| b.get_mut().clear());
|
|
|
|
self.next_material_id = 0;
|
|
self.material_id_lookup.clear();
|
|
self.material_ids_present_in_scene.clear();
|
|
}
|
|
}
|
|
|
|
pub fn extract_meshlet_mesh_entities(
|
|
mut meshlet_mesh_manager: ResMut<MeshletMeshManager>,
|
|
mut instance_manager: ResMut<InstanceManager>,
|
|
// TODO: Replace main_world and system_state when Extract<ResMut<Assets<MeshletMesh>>> is possible
|
|
mut main_world: ResMut<MainWorld>,
|
|
mut system_state: Local<
|
|
Option<
|
|
SystemState<(
|
|
Query<(
|
|
Entity,
|
|
&Handle<MeshletMesh>,
|
|
&GlobalTransform,
|
|
Option<&PreviousGlobalTransform>,
|
|
Option<&RenderLayers>,
|
|
Has<NotShadowReceiver>,
|
|
Has<NotShadowCaster>,
|
|
)>,
|
|
Res<AssetServer>,
|
|
ResMut<Assets<MeshletMesh>>,
|
|
EventReader<AssetEvent<MeshletMesh>>,
|
|
&Entities,
|
|
)>,
|
|
>,
|
|
>,
|
|
) {
|
|
// Get instances query
|
|
if system_state.is_none() {
|
|
*system_state = Some(SystemState::new(&mut main_world));
|
|
}
|
|
let system_state = system_state.as_mut().unwrap();
|
|
let (instances_query, asset_server, mut assets, mut asset_events, entities) =
|
|
system_state.get_mut(&mut main_world);
|
|
|
|
// Reset per-frame data
|
|
instance_manager.reset(entities);
|
|
|
|
// Free GPU buffer space for any modified or dropped MeshletMesh assets
|
|
for asset_event in asset_events.read() {
|
|
if let AssetEvent::Unused { id } | AssetEvent::Modified { id } = asset_event {
|
|
meshlet_mesh_manager.remove(id);
|
|
}
|
|
}
|
|
|
|
// Iterate over every instance
|
|
for (
|
|
instance,
|
|
meshlet_mesh,
|
|
transform,
|
|
previous_transform,
|
|
render_layers,
|
|
not_shadow_receiver,
|
|
not_shadow_caster,
|
|
) in &instances_query
|
|
{
|
|
// Skip instances with an unloaded MeshletMesh asset
|
|
// TODO: This is a semi-expensive check
|
|
if asset_server.is_managed(meshlet_mesh.id())
|
|
&& !asset_server.is_loaded_with_dependencies(meshlet_mesh.id())
|
|
{
|
|
continue;
|
|
}
|
|
|
|
// Upload the instance's MeshletMesh asset data if not done already done
|
|
let meshlets_slice =
|
|
meshlet_mesh_manager.queue_upload_if_needed(meshlet_mesh.id(), &mut assets);
|
|
|
|
// Add the instance's data to the instance manager
|
|
instance_manager.add_instance(
|
|
instance,
|
|
meshlets_slice,
|
|
transform,
|
|
previous_transform,
|
|
render_layers,
|
|
not_shadow_receiver,
|
|
not_shadow_caster,
|
|
);
|
|
}
|
|
}
|
|
|
|
/// For each entity in the scene, record what material ID its material was assigned in the `prepare_material_meshlet_meshes` systems,
|
|
/// and note that the material is used by at least one entity in the scene.
|
|
pub fn queue_material_meshlet_meshes<M: Material>(
|
|
mut instance_manager: ResMut<InstanceManager>,
|
|
render_material_instances: Res<RenderMaterialInstances<M>>,
|
|
) {
|
|
let instance_manager = instance_manager.deref_mut();
|
|
|
|
for (i, (instance, _, _)) in instance_manager.instances.iter().enumerate() {
|
|
if let Some(material_asset_id) = render_material_instances.get(instance) {
|
|
if let Some(material_id) = instance_manager
|
|
.material_id_lookup
|
|
.get(&material_asset_id.untyped())
|
|
{
|
|
instance_manager
|
|
.material_ids_present_in_scene
|
|
.insert(*material_id);
|
|
instance_manager.instance_material_ids.get_mut()[i] = *material_id;
|
|
}
|
|
}
|
|
}
|
|
}
|