bevy/crates/bevy_pbr/src/meshlet/resolve_render_targets.wgsl
JMS55 a0faf9cd01
More triangles/vertices per meshlet (#15023)
### Builder changes
- Increased meshlet max vertices/triangles from 64v/64t to 255v/128t
(meshoptimizer won't allow 256v sadly). This gives us a much greater
percentage of meshlets with max triangle count (128). Still not perfect,
we still end up with some tiny <=10 triangle meshlets that never really
get simplified, but it's progress.
- Removed the error target limit. Now we allow meshoptimizer to simplify
as much as possible. No reason to cap this out, as the cluster culling
code will choose a good LOD level anyways. Again leads to higher quality
LOD trees.
- After some discussion and consulting the Nanite slides again, changed
meshlet group error from _adding_ the max child's error to the group
error, to doing `group_error = max(group_error, max_child_error)`. Error
is already cumulative between LODs as the edges we're collapsing during
simplification get longer each time.
- Bumped the 65% simplification threshold to allow up to 95% of the
original geometry (e.g. accept simplification as valid even if we only
simplified 5% of the triangles). This gives us closer to
log2(initial_meshlet_count) LOD levels, and fewer meshlet roots in the
DAG.

Still more work to be done in the future here. Maybe trying METIS for
meshlet building instead of meshoptimizer.

Using ~8 clusters per group instead of ~4 might also make a big
difference. The Nanite slides say that they have 8-32 meshlets per
group, suggesting some kind of heuristic. Unfortunately meshopt's
compute_cluster_bounds won't work with large groups atm
(https://github.com/zeux/meshoptimizer/discussions/750#discussioncomment-10562641)
so hard to test.

Based on discussion from
https://github.com/bevyengine/bevy/discussions/14998,
https://github.com/zeux/meshoptimizer/discussions/750, and discord.

### Runtime changes
- cluster:triangle packed IDs are now stored 25:7 instead of 26:6 bits,
as max triangles per cluster are now 128 instead of 64
- Hardware raster now spawns 128 * 3 vertices instead of 64 * 3 vertices
to account for the new max triangles limit
- Hardware raster now outputs NaN triangles (0 / 0) instead of
zero-positioned triangles for extra vertex invocations over the cluster
triangle count. Shouldn't really be a difference idt, but I did it
anyways.
- Software raster now does 128 threads per workgroup instead of 64
threads. Each thread now loads, projects, and caches a vertex (vertices
0-127), and then if needed does so again (vertices 128-254). Each thread
then rasterizes one of 128 triangles.
- Fixed a bug with `needs_dispatch_remap`. I had the condition backwards
in my last PR, I probably committed it by accident after testing the
non-default code path on my GPU.
2024-09-08 17:55:57 +00:00

40 lines
1.6 KiB
WebGPU Shading Language

#import bevy_core_pipeline::fullscreen_vertex_shader::FullscreenVertexOutput
#ifdef MESHLET_VISIBILITY_BUFFER_RASTER_PASS_OUTPUT
@group(0) @binding(0) var<storage, read> meshlet_visibility_buffer: array<u64>; // Per pixel
#else
@group(0) @binding(0) var<storage, read> meshlet_visibility_buffer: array<u32>; // Per pixel
#endif
@group(0) @binding(1) var<storage, read> meshlet_cluster_instance_ids: array<u32>; // Per cluster
@group(0) @binding(2) var<storage, read> meshlet_instance_material_ids: array<u32>; // Per entity instance
var<push_constant> view_width: u32;
/// This pass writes out the depth texture.
@fragment
fn resolve_depth(in: FullscreenVertexOutput) -> @builtin(frag_depth) f32 {
let frag_coord_1d = u32(in.position.y) * view_width + u32(in.position.x);
let visibility = meshlet_visibility_buffer[frag_coord_1d];
#ifdef MESHLET_VISIBILITY_BUFFER_RASTER_PASS_OUTPUT
return bitcast<f32>(u32(visibility >> 32u));
#else
return bitcast<f32>(visibility);
#endif
}
/// This pass writes out the material depth texture.
#ifdef MESHLET_VISIBILITY_BUFFER_RASTER_PASS_OUTPUT
@fragment
fn resolve_material_depth(in: FullscreenVertexOutput) -> @builtin(frag_depth) f32 {
let frag_coord_1d = u32(in.position.y) * view_width + u32(in.position.x);
let visibility = meshlet_visibility_buffer[frag_coord_1d];
let depth = visibility >> 32u;
if depth == 0lu { return 0.0; }
let cluster_id = u32(visibility) >> 7u;
let instance_id = meshlet_cluster_instance_ids[cluster_id];
let material_id = meshlet_instance_material_ids[instance_id];
return f32(material_id) / 65535.0;
}
#endif