bevy/crates/bevy_pbr/src/render/clustered_forward.wgsl
Patrick Walton b7bcd313ca
Cluster light probes using conservative spherical bounds. (#13746)
This commit allows the Bevy renderer to use the clustering
infrastructure for light probes (reflection probes and irradiance
volumes) on platforms where at least 3 storage buffers are available. On
such platforms (the vast majority), we stop performing brute-force
searches of light probes for each fragment and instead only search the
light probes with bounding spheres that intersect the current cluster.
This should dramatically improve scalability of irradiance volumes and
reflection probes.

The primary platform that doesn't support 3 storage buffers is WebGL 2,
and we continue using a brute-force search of light probes on that
platform, as the UBO that stores per-cluster indices is too small to fit
the light probe counts. Note, however, that that platform also doesn't
support bindless textures (indeed, it would be very odd for a platform
to support bindless textures but not SSBOs), so we only support one of
each type of light probe per drawcall there in the first place.
Consequently, this isn't a performance problem, as the search will only
have one light probe to consider. (In fact, clustering would probably
end up being a performance loss.)

Known potential improvements include:

1. We currently cull based on a conservative bounding sphere test and
not based on the oriented bounding box (OBB) of the light probe. This is
improvable, but in the interests of simplicity, I opted to keep the
bounding sphere test for now. The OBB improvement can be a follow-up.

2. This patch doesn't change the fact that each fragment only takes a
single light probe into account. Typical light probe implementations
detect the case in which multiple light probes cover the current
fragment and perform some sort of weighted blend between them. As the
light probe fetch function presently returns only a single light probe,
implementing that feature would require more code restructuring, so I
left it out for now. It can be added as a follow-up.

3. Light probe implementations typically have a falloff range. Although
this is a wanted feature in Bevy, this particular commit also doesn't
implement that feature, as it's out of scope.

4. This commit doesn't raise the maximum number of light probes past its
current value of 8 for each type. This should be addressed later, but
would possibly require more bindings on platforms with storage buffers,
which would increase this patch's complexity. Even without raising the
limit, this patch should constitute a significant performance
improvement for scenes that get anywhere close to this limit. In the
interest of keeping this patch small, I opted to leave raising the limit
to a follow-up.

## Changelog

### Changed

* Light probes (reflection probes and irradiance volumes) are now
clustered on most platforms, improving performance when many light
probes are present.

---------

Co-authored-by: Benjamin Brienen <Benjamin.Brienen@outlook.com>
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
2024-12-05 13:07:10 +00:00

191 lines
8.5 KiB
WebGPU Shading Language

#define_import_path bevy_pbr::clustered_forward
#import bevy_pbr::{
mesh_view_bindings as bindings,
utils::rand_f,
}
#import bevy_render::{
color_operations::hsv_to_rgb,
maths::PI_2,
}
// Offsets within the `cluster_offsets_and_counts` buffer for a single cluster.
//
// These offsets must be monotonically nondecreasing. That is, indices are
// always sorted into the following order: point lights, spot lights, reflection
// probes, irradiance volumes.
struct ClusterableObjectIndexRanges {
// The offset of the index of the first point light.
first_point_light_index_offset: u32,
// The offset of the index of the first spot light, which also terminates
// the list of point lights.
first_spot_light_index_offset: u32,
// The offset of the index of the first reflection probe, which also
// terminates the list of spot lights.
first_reflection_probe_index_offset: u32,
// The offset of the index of the first irradiance volumes, which also
// terminates the list of reflection probes.
first_irradiance_volume_index_offset: u32,
// One past the offset of the index of the final clusterable object for this
// cluster.
last_clusterable_object_index_offset: u32,
}
// NOTE: Keep in sync with bevy_pbr/src/light.rs
fn view_z_to_z_slice(view_z: f32, is_orthographic: bool) -> u32 {
var z_slice: u32 = 0u;
if is_orthographic {
// NOTE: view_z is correct in the orthographic case
z_slice = u32(floor((view_z - bindings::lights.cluster_factors.z) * bindings::lights.cluster_factors.w));
} else {
// NOTE: had to use -view_z to make it positive else log(negative) is nan
z_slice = u32(log(-view_z) * bindings::lights.cluster_factors.z - bindings::lights.cluster_factors.w + 1.0);
}
// NOTE: We use min as we may limit the far z plane used for clustering to be closer than
// the furthest thing being drawn. This means that we need to limit to the maximum cluster.
return min(z_slice, bindings::lights.cluster_dimensions.z - 1u);
}
fn fragment_cluster_index(frag_coord: vec2<f32>, view_z: f32, is_orthographic: bool) -> u32 {
let xy = vec2<u32>(floor((frag_coord - bindings::view.viewport.xy) * bindings::lights.cluster_factors.xy));
let z_slice = view_z_to_z_slice(view_z, is_orthographic);
// NOTE: Restricting cluster index to avoid undefined behavior when accessing uniform buffer
// arrays based on the cluster index.
return min(
(xy.y * bindings::lights.cluster_dimensions.x + xy.x) * bindings::lights.cluster_dimensions.z + z_slice,
bindings::lights.cluster_dimensions.w - 1u
);
}
// this must match CLUSTER_COUNT_SIZE in light.rs
const CLUSTER_COUNT_SIZE = 9u;
// Returns the indices of clusterable objects belonging to the given cluster.
//
// Note that if fewer than 3 SSBO bindings are available (in WebGL 2,
// primarily), light probes aren't clustered, and therefore both light probe
// index ranges will be empty.
fn unpack_clusterable_object_index_ranges(cluster_index: u32) -> ClusterableObjectIndexRanges {
#if AVAILABLE_STORAGE_BUFFER_BINDINGS >= 3
let offset_and_counts_a = bindings::cluster_offsets_and_counts.data[cluster_index][0];
let offset_and_counts_b = bindings::cluster_offsets_and_counts.data[cluster_index][1];
// Sum up the counts to produce the range brackets.
//
// We could have stored the range brackets in `cluster_offsets_and_counts`
// directly, but doing it this way makes the logic in this path more
// consistent with the WebGL 2 path below.
let point_light_offset = offset_and_counts_a.x;
let spot_light_offset = point_light_offset + offset_and_counts_a.y;
let reflection_probe_offset = spot_light_offset + offset_and_counts_a.z;
let irradiance_volume_offset = reflection_probe_offset + offset_and_counts_a.w;
let last_clusterable_offset = irradiance_volume_offset + offset_and_counts_b.x;
return ClusterableObjectIndexRanges(
point_light_offset,
spot_light_offset,
reflection_probe_offset,
irradiance_volume_offset,
last_clusterable_offset
);
#else // AVAILABLE_STORAGE_BUFFER_BINDINGS >= 3
let raw_offset_and_counts = bindings::cluster_offsets_and_counts.data[cluster_index >> 2u][cluster_index & ((1u << 2u) - 1u)];
// [ 31 .. 18 | 17 .. 9 | 8 .. 0 ]
// [ offset | point light count | spot light count ]
let offset_and_counts = vec3<u32>(
(raw_offset_and_counts >> (CLUSTER_COUNT_SIZE * 2u)) & ((1u << (32u - (CLUSTER_COUNT_SIZE * 2u))) - 1u),
(raw_offset_and_counts >> CLUSTER_COUNT_SIZE) & ((1u << CLUSTER_COUNT_SIZE) - 1u),
raw_offset_and_counts & ((1u << CLUSTER_COUNT_SIZE) - 1u),
);
// We don't cluster reflection probes or irradiance volumes on this
// platform, as there's no room in the UBO. Thus, those offset ranges
// (corresponding to `offset_d` and `offset_e` above) are empty and are
// simply copies of `offset_c`.
let offset_a = offset_and_counts.x;
let offset_b = offset_a + offset_and_counts.y;
let offset_c = offset_b + offset_and_counts.z;
return ClusterableObjectIndexRanges(offset_a, offset_b, offset_c, offset_c, offset_c);
#endif // AVAILABLE_STORAGE_BUFFER_BINDINGS >= 3
}
// Returns the index of the clusterable object at the given offset.
//
// Note that, in the case of a light probe, the index refers to an element in
// one of the two `light_probes` sublists, not the `clusterable_objects` list.
fn get_clusterable_object_id(index: u32) -> u32 {
#if AVAILABLE_STORAGE_BUFFER_BINDINGS >= 3
return bindings::clusterable_object_index_lists.data[index];
#else
// The index is correct but in clusterable_object_index_lists we pack 4 u8s into a u32
// This means the index into clusterable_object_index_lists is index / 4
let indices = bindings::clusterable_object_index_lists.data[index >> 4u][(index >> 2u) &
((1u << 2u) - 1u)];
// And index % 4 gives the sub-index of the u8 within the u32 so we shift by 8 * sub-index
return (indices >> (8u * (index & ((1u << 2u) - 1u)))) & ((1u << 8u) - 1u);
#endif
}
fn cluster_debug_visualization(
input_color: vec4<f32>,
view_z: f32,
is_orthographic: bool,
clusterable_object_index_ranges: ClusterableObjectIndexRanges,
cluster_index: u32,
) -> vec4<f32> {
var output_color = input_color;
// Cluster allocation debug (using 'over' alpha blending)
#ifdef CLUSTERED_FORWARD_DEBUG_Z_SLICES
// NOTE: This debug mode visualizes the z-slices
let cluster_overlay_alpha = 0.1;
var z_slice: u32 = view_z_to_z_slice(view_z, is_orthographic);
// A hack to make the colors alternate a bit more
if (z_slice & 1u) == 1u {
z_slice = z_slice + bindings::lights.cluster_dimensions.z / 2u;
}
let slice_color_hsv = vec3(
f32(z_slice) / f32(bindings::lights.cluster_dimensions.z + 1u) * PI_2,
1.0,
0.5
);
let slice_color = hsv_to_rgb(slice_color_hsv);
output_color = vec4<f32>(
(1.0 - cluster_overlay_alpha) * output_color.rgb + cluster_overlay_alpha * slice_color,
output_color.a
);
#endif // CLUSTERED_FORWARD_DEBUG_Z_SLICES
#ifdef CLUSTERED_FORWARD_DEBUG_CLUSTER_COMPLEXITY
// NOTE: This debug mode visualizes the number of clusterable objects within
// the cluster that contains the fragment. It shows a sort of cluster
// complexity measure.
let cluster_overlay_alpha = 0.1;
let max_complexity_per_cluster = 64.0;
let object_count = clusterable_object_index_ranges.first_reflection_probe_index_offset -
clusterable_object_index_ranges.first_point_light_index_offset;
output_color.r = (1.0 - cluster_overlay_alpha) * output_color.r + cluster_overlay_alpha *
smoothstep(0.0, max_complexity_per_cluster, f32(object_count));
output_color.g = (1.0 - cluster_overlay_alpha) * output_color.g + cluster_overlay_alpha *
(1.0 - smoothstep(0.0, max_complexity_per_cluster, f32(object_count)));
#endif // CLUSTERED_FORWARD_DEBUG_CLUSTER_COMPLEXITY
#ifdef CLUSTERED_FORWARD_DEBUG_CLUSTER_COHERENCY
// NOTE: Visualizes the cluster to which the fragment belongs
let cluster_overlay_alpha = 0.1;
var rng = cluster_index;
let cluster_color_hsv = vec3(rand_f(&rng) * PI_2, 1.0, 0.5);
let cluster_color = hsv_to_rgb(cluster_color_hsv);
output_color = vec4<f32>(
(1.0 - cluster_overlay_alpha) * output_color.rgb + cluster_overlay_alpha * cluster_color,
output_color.a
);
#endif // CLUSTERED_FORWARD_DEBUG_CLUSTER_COHERENCY
return output_color;
}