
*Occlusion culling* allows the GPU to skip the vertex and fragment shading overhead for objects that can be quickly proved to be invisible because they're behind other geometry. A depth prepass already eliminates most fragment shading overhead for occluded objects, but the vertex shading overhead, as well as the cost of testing and rejecting fragments against the Z-buffer, is presently unavoidable for standard meshes. We currently perform occlusion culling only for meshlets. But other meshes, such as skinned meshes, can benefit from occlusion culling too in order to avoid the transform and skinning overhead for unseen meshes. This commit adapts the same [*two-phase occlusion culling*] technique that meshlets use to Bevy's standard 3D mesh pipeline when the new `OcclusionCulling` component, as well as the `DepthPrepass` component, are present on the camera. It has these steps: 1. *Early depth prepass*: We use the hierarchical Z-buffer from the previous frame to cull meshes for the initial depth prepass, effectively rendering only the meshes that were visible in the last frame. 2. *Early depth downsample*: We downsample the depth buffer to create another hierarchical Z-buffer, this time with the current view transform. 3. *Late depth prepass*: We use the new hierarchical Z-buffer to test all meshes that weren't rendered in the early depth prepass. Any meshes that pass this check are rendered. 4. *Late depth downsample*: Again, we downsample the depth buffer to create a hierarchical Z-buffer in preparation for the early depth prepass of the next frame. This step is done after all the rendering, in order to account for custom phase items that might write to the depth buffer. Note that this patch has no effect on the per-mesh CPU overhead for occluded objects, which remains high for a GPU-driven renderer due to the lack of `cold-specialization` and retained bins. If `cold-specialization` and retained bins weren't on the horizon, then a more traditional approach like potentially visible sets (PVS) or low-res CPU rendering would probably be more efficient than the GPU-driven approach that this patch implements for most scenes. However, at this point the amount of effort required to implement a PVS baking tool or a low-res CPU renderer would probably be greater than landing `cold-specialization` and retained bins, and the GPU driven approach is the more modern one anyway. It does mean that the performance improvements from occlusion culling as implemented in this patch *today* are likely to be limited, because of the high CPU overhead for occluded meshes. Note also that this patch currently doesn't implement occlusion culling for 2D objects or shadow maps. Those can be addressed in a follow-up. Additionally, note that the techniques in this patch require compute shaders, which excludes support for WebGL 2. This PR is marked experimental because of known precision issues with the downsampling approach when applied to non-power-of-two framebuffer sizes (i.e. most of them). These precision issues can, in rare cases, cause objects to be judged occluded that in fact are not. (I've never seen this in practice, but I know it's possible; it tends to be likelier to happen with small meshes.) As a follow-up to this patch, we desire to switch to the [SPD-based hi-Z buffer shader from the Granite engine], which doesn't suffer from these problems, at which point we should be able to graduate this feature from experimental status. I opted not to include that rewrite in this patch for two reasons: (1) @JMS55 is planning on doing the rewrite to coincide with the new availability of image atomic operations in Naga; (2) to reduce the scope of this patch. A new example, `occlusion_culling`, has been added. It demonstrates objects becoming quickly occluded and disoccluded by dynamic geometry and shows the number of objects that are actually being rendered. Also, a new `--occlusion-culling` switch has been added to `scene_viewer`, in order to make it easy to test this patch with large scenes like Bistro. [*two-phase occlusion culling*]: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501 [Aaltonen SIGGRAPH 2015]: https://www.advances.realtimerendering.com/s2015/aaltonenhaar_siggraph2015_combined_final_footer_220dpi.pdf [Some literature]: https://gist.github.com/reduz/c5769d0e705d8ab7ac187d63be0099b5?permalink_comment_id=5040452#gistcomment-5040452 [SPD-based hi-Z buffer shader from the Granite engine]: https://github.com/Themaister/Granite/blob/master/assets/shaders/post/hiz.comp ## Migration guide * When enqueuing a custom mesh pipeline, work item buffers are now created with `bevy::render::batching::gpu_preprocessing::get_or_create_work_item_buffer`, not `PreprocessWorkItemBuffers::new`. See the `specialized_mesh_pipeline` example. ## Showcase Occlusion culling example:  Bistro zoomed out, before occlusion culling:  Bistro zoomed out, after occlusion culling:  In this scene, occlusion culling reduces the number of meshes Bevy has to render from 1591 to 585.
335 lines
12 KiB
Rust
335 lines
12 KiB
Rust
//! Render high-poly 3d meshes using an efficient GPU-driven method. See [`MeshletPlugin`] and [`MeshletMesh`] for details.
|
|
|
|
mod asset;
|
|
#[cfg(feature = "meshlet_processor")]
|
|
mod from_mesh;
|
|
mod instance_manager;
|
|
mod material_pipeline_prepare;
|
|
mod material_shade_nodes;
|
|
mod meshlet_mesh_manager;
|
|
mod persistent_buffer;
|
|
mod persistent_buffer_impls;
|
|
mod pipelines;
|
|
mod resource_manager;
|
|
mod visibility_buffer_raster_node;
|
|
|
|
pub mod graph {
|
|
use bevy_render::render_graph::RenderLabel;
|
|
|
|
#[derive(Debug, Hash, PartialEq, Eq, Clone, RenderLabel)]
|
|
pub enum NodeMeshlet {
|
|
VisibilityBufferRasterPass,
|
|
Prepass,
|
|
DeferredPrepass,
|
|
MainOpaquePass,
|
|
}
|
|
}
|
|
|
|
pub(crate) use self::{
|
|
instance_manager::{queue_material_meshlet_meshes, InstanceManager},
|
|
material_pipeline_prepare::{
|
|
prepare_material_meshlet_meshes_main_opaque_pass, prepare_material_meshlet_meshes_prepass,
|
|
},
|
|
};
|
|
|
|
pub use self::asset::{
|
|
MeshletMesh, MeshletMeshLoader, MeshletMeshSaver, MESHLET_MESH_ASSET_VERSION,
|
|
};
|
|
#[cfg(feature = "meshlet_processor")]
|
|
pub use self::from_mesh::{
|
|
MeshToMeshletMeshConversionError, MESHLET_DEFAULT_VERTEX_POSITION_QUANTIZATION_FACTOR,
|
|
};
|
|
use self::{
|
|
graph::NodeMeshlet,
|
|
instance_manager::extract_meshlet_mesh_entities,
|
|
material_pipeline_prepare::{
|
|
MeshletViewMaterialsDeferredGBufferPrepass, MeshletViewMaterialsMainOpaquePass,
|
|
MeshletViewMaterialsPrepass,
|
|
},
|
|
material_shade_nodes::{
|
|
MeshletDeferredGBufferPrepassNode, MeshletMainOpaquePass3dNode, MeshletPrepassNode,
|
|
},
|
|
meshlet_mesh_manager::{perform_pending_meshlet_mesh_writes, MeshletMeshManager},
|
|
pipelines::*,
|
|
resource_manager::{
|
|
prepare_meshlet_per_frame_resources, prepare_meshlet_view_bind_groups, ResourceManager,
|
|
},
|
|
visibility_buffer_raster_node::MeshletVisibilityBufferRasterPassNode,
|
|
};
|
|
use crate::graph::NodePbr;
|
|
use crate::PreviousGlobalTransform;
|
|
use bevy_app::{App, Plugin};
|
|
use bevy_asset::{load_internal_asset, AssetApp, AssetId, Handle};
|
|
use bevy_core_pipeline::{
|
|
core_3d::graph::{Core3d, Node3d},
|
|
prepass::{DeferredPrepass, MotionVectorPrepass, NormalPrepass},
|
|
};
|
|
use bevy_derive::{Deref, DerefMut};
|
|
use bevy_ecs::{
|
|
component::{require, Component},
|
|
entity::Entity,
|
|
query::Has,
|
|
reflect::ReflectComponent,
|
|
schedule::IntoSystemConfigs,
|
|
system::{Commands, Query},
|
|
};
|
|
use bevy_reflect::{std_traits::ReflectDefault, Reflect};
|
|
use bevy_render::{
|
|
render_graph::{RenderGraphApp, ViewNodeRunner},
|
|
render_resource::Shader,
|
|
renderer::RenderDevice,
|
|
settings::WgpuFeatures,
|
|
view::{self, prepare_view_targets, Msaa, Visibility, VisibilityClass},
|
|
ExtractSchedule, Render, RenderApp, RenderSet,
|
|
};
|
|
use bevy_transform::components::Transform;
|
|
use derive_more::From;
|
|
use tracing::error;
|
|
|
|
const MESHLET_BINDINGS_SHADER_HANDLE: Handle<Shader> = Handle::weak_from_u128(1325134235233421);
|
|
const MESHLET_MESH_MATERIAL_SHADER_HANDLE: Handle<Shader> =
|
|
Handle::weak_from_u128(3325134235233421);
|
|
|
|
/// Provides a plugin for rendering large amounts of high-poly 3d meshes using an efficient GPU-driven method. See also [`MeshletMesh`].
|
|
///
|
|
/// Rendering dense scenes made of high-poly meshes with thousands or millions of triangles is extremely expensive in Bevy's standard renderer.
|
|
/// Once meshes are pre-processed into a [`MeshletMesh`], this plugin can render these kinds of scenes very efficiently.
|
|
///
|
|
/// In comparison to Bevy's standard renderer:
|
|
/// * Much more efficient culling. Meshlets can be culled individually, instead of all or nothing culling for entire meshes at a time.
|
|
/// Additionally, occlusion culling can eliminate meshlets that would cause overdraw.
|
|
/// * Much more efficient batching. All geometry can be rasterized in a single draw.
|
|
/// * Scales better with large amounts of dense geometry and overdraw. Bevy's standard renderer will bottleneck sooner.
|
|
/// * Near-seamless level of detail (LOD).
|
|
/// * Much greater base overhead. Rendering will be slower and use more memory than Bevy's standard renderer
|
|
/// with small amounts of geometry and overdraw.
|
|
/// * Requires preprocessing meshes. See [`MeshletMesh`] for details.
|
|
/// * Limitations on the kinds of materials you can use. See [`MeshletMesh`] for details.
|
|
///
|
|
/// This plugin requires a fairly recent GPU that supports [`WgpuFeatures::SHADER_INT64_ATOMIC_MIN_MAX`].
|
|
///
|
|
/// This plugin currently works only on the Vulkan backend.
|
|
///
|
|
/// This plugin is not compatible with [`Msaa`]. Any camera rendering a [`MeshletMesh`] must have
|
|
/// [`Msaa`] set to [`Msaa::Off`].
|
|
///
|
|
/// Mixing forward+prepass and deferred rendering for opaque materials is not currently supported when using this plugin.
|
|
/// You must use one or the other by setting [`crate::DefaultOpaqueRendererMethod`].
|
|
/// Do not override [`crate::Material::opaque_render_method`] for any material when using this plugin.
|
|
///
|
|
/// 
|
|
pub struct MeshletPlugin {
|
|
/// The maximum amount of clusters that can be processed at once,
|
|
/// used to control the size of a pre-allocated GPU buffer.
|
|
///
|
|
/// If this number is too low, you'll see rendering artifacts like missing or blinking meshes.
|
|
///
|
|
/// Each cluster slot costs 4 bytes of VRAM.
|
|
///
|
|
/// Must not be greater than 2^25.
|
|
pub cluster_buffer_slots: u32,
|
|
}
|
|
|
|
impl MeshletPlugin {
|
|
/// [`WgpuFeatures`] required for this plugin to function.
|
|
pub fn required_wgpu_features() -> WgpuFeatures {
|
|
WgpuFeatures::SHADER_INT64_ATOMIC_MIN_MAX
|
|
| WgpuFeatures::SHADER_INT64
|
|
| WgpuFeatures::SUBGROUP
|
|
| WgpuFeatures::DEPTH_CLIP_CONTROL
|
|
| WgpuFeatures::PUSH_CONSTANTS
|
|
}
|
|
}
|
|
|
|
impl Plugin for MeshletPlugin {
|
|
fn build(&self, app: &mut App) {
|
|
#[cfg(target_endian = "big")]
|
|
compile_error!("MeshletPlugin is only supported on little-endian processors.");
|
|
|
|
if self.cluster_buffer_slots > 2_u32.pow(25) {
|
|
error!("MeshletPlugin::cluster_buffer_slots must not be greater than 2^25.");
|
|
std::process::exit(1);
|
|
}
|
|
|
|
load_internal_asset!(
|
|
app,
|
|
MESHLET_BINDINGS_SHADER_HANDLE,
|
|
"meshlet_bindings.wgsl",
|
|
Shader::from_wgsl
|
|
);
|
|
load_internal_asset!(
|
|
app,
|
|
super::MESHLET_VISIBILITY_BUFFER_RESOLVE_SHADER_HANDLE,
|
|
"visibility_buffer_resolve.wgsl",
|
|
Shader::from_wgsl
|
|
);
|
|
load_internal_asset!(
|
|
app,
|
|
MESHLET_FILL_CLUSTER_BUFFERS_SHADER_HANDLE,
|
|
"fill_cluster_buffers.wgsl",
|
|
Shader::from_wgsl
|
|
);
|
|
load_internal_asset!(
|
|
app,
|
|
MESHLET_CULLING_SHADER_HANDLE,
|
|
"cull_clusters.wgsl",
|
|
Shader::from_wgsl
|
|
);
|
|
load_internal_asset!(
|
|
app,
|
|
MESHLET_VISIBILITY_BUFFER_SOFTWARE_RASTER_SHADER_HANDLE,
|
|
"visibility_buffer_software_raster.wgsl",
|
|
Shader::from_wgsl
|
|
);
|
|
load_internal_asset!(
|
|
app,
|
|
MESHLET_VISIBILITY_BUFFER_HARDWARE_RASTER_SHADER_HANDLE,
|
|
"visibility_buffer_hardware_raster.wgsl",
|
|
Shader::from_wgsl
|
|
);
|
|
load_internal_asset!(
|
|
app,
|
|
MESHLET_MESH_MATERIAL_SHADER_HANDLE,
|
|
"meshlet_mesh_material.wgsl",
|
|
Shader::from_wgsl
|
|
);
|
|
load_internal_asset!(
|
|
app,
|
|
MESHLET_RESOLVE_RENDER_TARGETS_SHADER_HANDLE,
|
|
"resolve_render_targets.wgsl",
|
|
Shader::from_wgsl
|
|
);
|
|
load_internal_asset!(
|
|
app,
|
|
MESHLET_REMAP_1D_TO_2D_DISPATCH_SHADER_HANDLE,
|
|
"remap_1d_to_2d_dispatch.wgsl",
|
|
Shader::from_wgsl
|
|
);
|
|
|
|
app.init_asset::<MeshletMesh>()
|
|
.register_asset_loader(MeshletMeshLoader);
|
|
}
|
|
|
|
fn finish(&self, app: &mut App) {
|
|
let Some(render_app) = app.get_sub_app_mut(RenderApp) else {
|
|
return;
|
|
};
|
|
|
|
let render_device = render_app.world().resource::<RenderDevice>().clone();
|
|
let features = render_device.features();
|
|
if !features.contains(Self::required_wgpu_features()) {
|
|
error!(
|
|
"MeshletPlugin can't be used. GPU lacks support for required features: {:?}.",
|
|
Self::required_wgpu_features().difference(features)
|
|
);
|
|
std::process::exit(1);
|
|
}
|
|
|
|
render_app
|
|
.add_render_graph_node::<MeshletVisibilityBufferRasterPassNode>(
|
|
Core3d,
|
|
NodeMeshlet::VisibilityBufferRasterPass,
|
|
)
|
|
.add_render_graph_node::<ViewNodeRunner<MeshletPrepassNode>>(
|
|
Core3d,
|
|
NodeMeshlet::Prepass,
|
|
)
|
|
.add_render_graph_node::<ViewNodeRunner<MeshletDeferredGBufferPrepassNode>>(
|
|
Core3d,
|
|
NodeMeshlet::DeferredPrepass,
|
|
)
|
|
.add_render_graph_node::<ViewNodeRunner<MeshletMainOpaquePass3dNode>>(
|
|
Core3d,
|
|
NodeMeshlet::MainOpaquePass,
|
|
)
|
|
.add_render_graph_edges(
|
|
Core3d,
|
|
(
|
|
NodeMeshlet::VisibilityBufferRasterPass,
|
|
NodePbr::ShadowPass,
|
|
//
|
|
NodeMeshlet::Prepass,
|
|
//
|
|
NodeMeshlet::DeferredPrepass,
|
|
Node3d::DeferredPrepass,
|
|
Node3d::CopyDeferredLightingId,
|
|
Node3d::EndPrepasses,
|
|
//
|
|
Node3d::StartMainPass,
|
|
NodeMeshlet::MainOpaquePass,
|
|
Node3d::MainOpaquePass,
|
|
Node3d::EndMainPass,
|
|
),
|
|
)
|
|
.init_resource::<MeshletMeshManager>()
|
|
.insert_resource(InstanceManager::new())
|
|
.insert_resource(ResourceManager::new(
|
|
self.cluster_buffer_slots,
|
|
&render_device,
|
|
))
|
|
.init_resource::<MeshletPipelines>()
|
|
.add_systems(ExtractSchedule, extract_meshlet_mesh_entities)
|
|
.add_systems(
|
|
Render,
|
|
(
|
|
perform_pending_meshlet_mesh_writes.in_set(RenderSet::PrepareAssets),
|
|
configure_meshlet_views
|
|
.after(prepare_view_targets)
|
|
.in_set(RenderSet::ManageViews),
|
|
prepare_meshlet_per_frame_resources.in_set(RenderSet::PrepareResources),
|
|
prepare_meshlet_view_bind_groups.in_set(RenderSet::PrepareBindGroups),
|
|
),
|
|
);
|
|
}
|
|
}
|
|
|
|
/// The meshlet mesh equivalent of [`bevy_render::mesh::Mesh3d`].
|
|
#[derive(Component, Clone, Debug, Default, Deref, DerefMut, Reflect, PartialEq, Eq, From)]
|
|
#[reflect(Component, Default)]
|
|
#[require(Transform, PreviousGlobalTransform, Visibility, VisibilityClass)]
|
|
#[component(on_add = view::add_visibility_class::<MeshletMesh3d>)]
|
|
pub struct MeshletMesh3d(pub Handle<MeshletMesh>);
|
|
|
|
impl From<MeshletMesh3d> for AssetId<MeshletMesh> {
|
|
fn from(mesh: MeshletMesh3d) -> Self {
|
|
mesh.id()
|
|
}
|
|
}
|
|
|
|
impl From<&MeshletMesh3d> for AssetId<MeshletMesh> {
|
|
fn from(mesh: &MeshletMesh3d) -> Self {
|
|
mesh.id()
|
|
}
|
|
}
|
|
|
|
fn configure_meshlet_views(
|
|
mut views_3d: Query<(
|
|
Entity,
|
|
&Msaa,
|
|
Has<NormalPrepass>,
|
|
Has<MotionVectorPrepass>,
|
|
Has<DeferredPrepass>,
|
|
)>,
|
|
mut commands: Commands,
|
|
) {
|
|
for (entity, msaa, normal_prepass, motion_vector_prepass, deferred_prepass) in &mut views_3d {
|
|
if *msaa != Msaa::Off {
|
|
error!("MeshletPlugin can't be used with MSAA. Add Msaa::Off to your camera to use this plugin.");
|
|
std::process::exit(1);
|
|
}
|
|
|
|
if !(normal_prepass || motion_vector_prepass || deferred_prepass) {
|
|
commands
|
|
.entity(entity)
|
|
.insert(MeshletViewMaterialsMainOpaquePass::default());
|
|
} else {
|
|
// TODO: Should we add both Prepass and DeferredGBufferPrepass materials here, and in other systems/nodes?
|
|
commands.entity(entity).insert((
|
|
MeshletViewMaterialsMainOpaquePass::default(),
|
|
MeshletViewMaterialsPrepass::default(),
|
|
MeshletViewMaterialsDeferredGBufferPrepass::default(),
|
|
));
|
|
}
|
|
}
|
|
}
|