
Currently, Bevy's implementation of bindless resources is rather unusual: every binding in an object that implements `AsBindGroup` (most commonly, a material) becomes its own separate binding array in the shader. This is inefficient for two reasons: 1. If multiple materials reference the same texture or other resource, the reference to that resource will be duplicated many times. This increases `wgpu` validation overhead. 2. It creates many unused binding array slots. This increases `wgpu` and driver overhead and makes it easier to hit limits on APIs that `wgpu` currently imposes tight resource limits on, like Metal. This PR fixes these issues by switching Bevy to use the standard approach in GPU-driven renderers, in which resources are de-duplicated and passed as global arrays, one for each type of resource. Along the way, this patch introduces per-platform resource limits and bumps them from 16 resources per binding array to 64 resources per bind group on Metal and 2048 resources per bind group on other platforms. (Note that the number of resources per *binding array* isn't the same as the number of resources per *bind group*; as it currently stands, if all the PBR features are turned on, Bevy could pack as many as 496 resources into a single slab.) The limits have been increased because `wgpu` now has universal support for partially-bound binding arrays, which mean that we no longer need to fill the binding arrays with fallback resources on Direct3D 12. The `#[bindless(LIMIT)]` declaration when deriving `AsBindGroup` can now simply be written `#[bindless]` in order to have Bevy choose a default limit size for the current platform. Custom limits are still available with the new `#[bindless(limit(LIMIT))]` syntax: e.g. `#[bindless(limit(8))]`. The material bind group allocator has been completely rewritten. Now there are two allocators: one for bindless materials and one for non-bindless materials. The new non-bindless material allocator simply maintains a 1:1 mapping from material to bind group. The new bindless material allocator maintains a list of slabs and allocates materials into slabs on a first-fit basis. This unfortunately makes its performance O(number of resources per object * number of slabs), but the number of slabs is likely to be low, and it's planned to become even lower in the future with `wgpu` improvements. Resources are de-duplicated with in a slab and reference counted. So, for instance, if multiple materials refer to the same texture, that texture will exist only once in the appropriate binding array. To support these new features, this patch adds the concept of a *bindless descriptor* to the `AsBindGroup` trait. The bindless descriptor allows the material bind group allocator to probe the layout of the material, now that an array of `BindGroupLayoutEntry` records is insufficient to describe the group. The `#[derive(AsBindGroup)]` has been heavily modified to support the new features. The most important user-facing change to that macro is that the struct-level `uniform` attribute, `#[uniform(BINDING_NUMBER, StandardMaterial)]`, now reads `#[uniform(BINDLESS_INDEX, MATERIAL_UNIFORM_TYPE, binding_array(BINDING_NUMBER)]`, allowing the material to specify the binding number for the binding array that holds the uniform data. To make this patch simpler, I removed support for bindless `ExtendedMaterial`s, as well as field-level bindless uniform and storage buffers. I intend to add back support for these as a follow-up. Because they aren't in any released Bevy version yet, I figured this was OK. Finally, this patch updates `StandardMaterial` for the new bindless changes. Generally, code throughout the PBR shaders that looked like `base_color_texture[slot]` now looks like `bindless_2d_textures[material_indices[slot].base_color_texture]`. This patch fixes a system hang that I experienced on the [Caldera test] when running with `caldera --random-materials --texture-count 100`. The time per frame is around 19.75 ms, down from 154.2 ms in Bevy 0.14: a 7.8× speedup. [Caldera test]: https://github.com/DGriffin91/bevy_caldera_scene
140 lines
5.5 KiB
WebGPU Shading Language
140 lines
5.5 KiB
WebGPU Shading Language
#define_import_path bevy_pbr::parallax_mapping
|
|
|
|
#import bevy_render::bindless::{bindless_samplers_filtering, bindless_textures_2d}
|
|
|
|
#import bevy_pbr::{
|
|
pbr_bindings::{depth_map_texture, depth_map_sampler},
|
|
mesh_bindings::mesh
|
|
}
|
|
|
|
#ifdef BINDLESS
|
|
#import bevy_pbr::pbr_bindings::material_indices
|
|
#endif // BINDLESS
|
|
|
|
fn sample_depth_map(uv: vec2<f32>, material_bind_group_slot: u32) -> f32 {
|
|
// We use `textureSampleLevel` over `textureSample` because the wgpu DX12
|
|
// backend (Fxc) panics when using "gradient instructions" inside a loop.
|
|
// It results in the whole loop being unrolled by the shader compiler,
|
|
// which it can't do because the upper limit of the loop in steep parallax
|
|
// mapping is a variable set by the user.
|
|
// The "gradient instructions" comes from `textureSample` computing MIP level
|
|
// based on UV derivative. With `textureSampleLevel`, we provide ourselves
|
|
// the MIP level, so no gradient instructions are used, and we can use
|
|
// sample_depth_map in our loop.
|
|
// See https://stackoverflow.com/questions/56581141/direct3d11-gradient-instruction-used-in-a-loop-with-varying-iteration-forcing
|
|
return textureSampleLevel(
|
|
#ifdef BINDLESS
|
|
bindless_textures_2d[material_indices[material_bind_group_slot].depth_map_texture],
|
|
bindless_samplers_filtering[material_indices[material_bind_group_slot].depth_map_sampler],
|
|
#else // BINDLESS
|
|
depth_map_texture,
|
|
depth_map_sampler,
|
|
#endif // BINDLESS
|
|
uv,
|
|
0.0
|
|
).r;
|
|
}
|
|
|
|
// An implementation of parallax mapping, see https://en.wikipedia.org/wiki/Parallax_mapping
|
|
// Code derived from: https://web.archive.org/web/20150419215321/http://sunandblackcat.com/tipFullView.php?l=eng&topicid=28
|
|
fn parallaxed_uv(
|
|
depth_scale: f32,
|
|
max_layer_count: f32,
|
|
max_steps: u32,
|
|
// The original interpolated uv
|
|
original_uv: vec2<f32>,
|
|
// The vector from the camera to the fragment at the surface in tangent space
|
|
Vt: vec3<f32>,
|
|
material_bind_group_slot: u32,
|
|
) -> vec2<f32> {
|
|
if max_layer_count < 1.0 {
|
|
return original_uv;
|
|
}
|
|
var uv = original_uv;
|
|
|
|
// Steep Parallax Mapping
|
|
// ======================
|
|
// Split the depth map into `layer_count` layers.
|
|
// When Vt hits the surface of the mesh (excluding depth displacement),
|
|
// if the depth is not below or on surface including depth displacement (textureSample), then
|
|
// look forward (+= delta_uv) on depth texture according to
|
|
// Vt and distance between hit surface and depth map surface,
|
|
// repeat until below the surface.
|
|
//
|
|
// Where `layer_count` is interpolated between `1.0` and
|
|
// `max_layer_count` according to the steepness of Vt.
|
|
|
|
let view_steepness = abs(Vt.z);
|
|
// We mix with minimum value 1.0 because otherwise,
|
|
// with 0.0, we get a division by zero in surfaces parallel to viewport,
|
|
// resulting in a singularity.
|
|
let layer_count = mix(max_layer_count, 1.0, view_steepness);
|
|
let layer_depth = 1.0 / layer_count;
|
|
var delta_uv = depth_scale * layer_depth * Vt.xy * vec2(1.0, -1.0) / view_steepness;
|
|
|
|
var current_layer_depth = 0.0;
|
|
var texture_depth = sample_depth_map(uv, material_bind_group_slot);
|
|
|
|
// texture_depth > current_layer_depth means the depth map depth is deeper
|
|
// than the depth the ray would be at this UV offset so the ray has not
|
|
// intersected the surface
|
|
for (var i: i32 = 0; texture_depth > current_layer_depth && i <= i32(layer_count); i++) {
|
|
current_layer_depth += layer_depth;
|
|
uv += delta_uv;
|
|
texture_depth = sample_depth_map(uv, material_bind_group_slot);
|
|
}
|
|
|
|
#ifdef RELIEF_MAPPING
|
|
// Relief Mapping
|
|
// ==============
|
|
// "Refine" the rough result from Steep Parallax Mapping
|
|
// with a **binary search** between the layer selected by steep parallax
|
|
// and the next one to find a point closer to the depth map surface.
|
|
// This reduces the jaggy step artifacts from steep parallax mapping.
|
|
|
|
delta_uv *= 0.5;
|
|
var delta_depth = 0.5 * layer_depth;
|
|
|
|
uv -= delta_uv;
|
|
current_layer_depth -= delta_depth;
|
|
|
|
for (var i: u32 = 0u; i < max_steps; i++) {
|
|
texture_depth = sample_depth_map(uv, material_bind_group_slot);
|
|
|
|
// Halve the deltas for the next step
|
|
delta_uv *= 0.5;
|
|
delta_depth *= 0.5;
|
|
|
|
// Step based on whether the current depth is above or below the depth map
|
|
if (texture_depth > current_layer_depth) {
|
|
uv += delta_uv;
|
|
current_layer_depth += delta_depth;
|
|
} else {
|
|
uv -= delta_uv;
|
|
current_layer_depth -= delta_depth;
|
|
}
|
|
}
|
|
#else
|
|
// Parallax Occlusion mapping
|
|
// ==========================
|
|
// "Refine" Steep Parallax Mapping by interpolating between the
|
|
// previous layer's depth and the computed layer depth.
|
|
// Only requires a single lookup, unlike Relief Mapping, but
|
|
// may skip small details and result in writhing material artifacts.
|
|
let previous_uv = uv - delta_uv;
|
|
let next_depth = texture_depth - current_layer_depth;
|
|
let previous_depth = sample_depth_map(previous_uv, material_bind_group_slot) -
|
|
current_layer_depth + layer_depth;
|
|
|
|
let weight = next_depth / (next_depth - previous_depth);
|
|
|
|
uv = mix(uv, previous_uv, weight);
|
|
|
|
current_layer_depth += mix(next_depth, previous_depth, weight);
|
|
#endif
|
|
|
|
// Note: `current_layer_depth` is not returned, but may be useful
|
|
// for light computation later on in future improvements of the pbr shader.
|
|
return uv;
|
|
}
|