
*Occlusion culling* allows the GPU to skip the vertex and fragment shading overhead for objects that can be quickly proved to be invisible because they're behind other geometry. A depth prepass already eliminates most fragment shading overhead for occluded objects, but the vertex shading overhead, as well as the cost of testing and rejecting fragments against the Z-buffer, is presently unavoidable for standard meshes. We currently perform occlusion culling only for meshlets. But other meshes, such as skinned meshes, can benefit from occlusion culling too in order to avoid the transform and skinning overhead for unseen meshes. This commit adapts the same [*two-phase occlusion culling*] technique that meshlets use to Bevy's standard 3D mesh pipeline when the new `OcclusionCulling` component, as well as the `DepthPrepass` component, are present on the camera. It has these steps: 1. *Early depth prepass*: We use the hierarchical Z-buffer from the previous frame to cull meshes for the initial depth prepass, effectively rendering only the meshes that were visible in the last frame. 2. *Early depth downsample*: We downsample the depth buffer to create another hierarchical Z-buffer, this time with the current view transform. 3. *Late depth prepass*: We use the new hierarchical Z-buffer to test all meshes that weren't rendered in the early depth prepass. Any meshes that pass this check are rendered. 4. *Late depth downsample*: Again, we downsample the depth buffer to create a hierarchical Z-buffer in preparation for the early depth prepass of the next frame. This step is done after all the rendering, in order to account for custom phase items that might write to the depth buffer. Note that this patch has no effect on the per-mesh CPU overhead for occluded objects, which remains high for a GPU-driven renderer due to the lack of `cold-specialization` and retained bins. If `cold-specialization` and retained bins weren't on the horizon, then a more traditional approach like potentially visible sets (PVS) or low-res CPU rendering would probably be more efficient than the GPU-driven approach that this patch implements for most scenes. However, at this point the amount of effort required to implement a PVS baking tool or a low-res CPU renderer would probably be greater than landing `cold-specialization` and retained bins, and the GPU driven approach is the more modern one anyway. It does mean that the performance improvements from occlusion culling as implemented in this patch *today* are likely to be limited, because of the high CPU overhead for occluded meshes. Note also that this patch currently doesn't implement occlusion culling for 2D objects or shadow maps. Those can be addressed in a follow-up. Additionally, note that the techniques in this patch require compute shaders, which excludes support for WebGL 2. This PR is marked experimental because of known precision issues with the downsampling approach when applied to non-power-of-two framebuffer sizes (i.e. most of them). These precision issues can, in rare cases, cause objects to be judged occluded that in fact are not. (I've never seen this in practice, but I know it's possible; it tends to be likelier to happen with small meshes.) As a follow-up to this patch, we desire to switch to the [SPD-based hi-Z buffer shader from the Granite engine], which doesn't suffer from these problems, at which point we should be able to graduate this feature from experimental status. I opted not to include that rewrite in this patch for two reasons: (1) @JMS55 is planning on doing the rewrite to coincide with the new availability of image atomic operations in Naga; (2) to reduce the scope of this patch. A new example, `occlusion_culling`, has been added. It demonstrates objects becoming quickly occluded and disoccluded by dynamic geometry and shows the number of objects that are actually being rendered. Also, a new `--occlusion-culling` switch has been added to `scene_viewer`, in order to make it easy to test this patch with large scenes like Bistro. [*two-phase occlusion culling*]: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501 [Aaltonen SIGGRAPH 2015]: https://www.advances.realtimerendering.com/s2015/aaltonenhaar_siggraph2015_combined_final_footer_220dpi.pdf [Some literature]: https://gist.github.com/reduz/c5769d0e705d8ab7ac187d63be0099b5?permalink_comment_id=5040452#gistcomment-5040452 [SPD-based hi-Z buffer shader from the Granite engine]: https://github.com/Themaister/Granite/blob/master/assets/shaders/post/hiz.comp ## Migration guide * When enqueuing a custom mesh pipeline, work item buffers are now created with `bevy::render::batching::gpu_preprocessing::get_or_create_work_item_buffer`, not `PreprocessWorkItemBuffers::new`. See the `specialized_mesh_pipeline` example. ## Showcase Occlusion culling example:  Bistro zoomed out, before occlusion culling:  Bistro zoomed out, after occlusion culling:  In this scene, occlusion culling reduces the number of meshes Bevy has to render from 1591 to 585.
284 lines
9.7 KiB
Rust
284 lines
9.7 KiB
Rust
use crate::{
|
|
render_graph::{NodeState, RenderGraph, SlotInfos, SlotLabel, SlotType, SlotValue},
|
|
render_resource::{Buffer, Sampler, TextureView},
|
|
};
|
|
use alloc::borrow::Cow;
|
|
use bevy_ecs::{entity::Entity, intern::Interned};
|
|
use thiserror::Error;
|
|
|
|
use super::{InternedRenderSubGraph, RenderLabel, RenderSubGraph};
|
|
|
|
/// A command that signals the graph runner to run the sub graph corresponding to the `sub_graph`
|
|
/// with the specified `inputs` next.
|
|
pub struct RunSubGraph {
|
|
pub sub_graph: InternedRenderSubGraph,
|
|
pub inputs: Vec<SlotValue>,
|
|
pub view_entity: Option<Entity>,
|
|
}
|
|
|
|
/// The context with all graph information required to run a [`Node`](super::Node).
|
|
/// This context is created for each node by the render graph runner.
|
|
///
|
|
/// The slot input can be read from here and the outputs must be written back to the context for
|
|
/// passing them onto the next node.
|
|
///
|
|
/// Sub graphs can be queued for running by adding a [`RunSubGraph`] command to the context.
|
|
/// After the node has finished running the graph runner is responsible for executing the sub graphs.
|
|
pub struct RenderGraphContext<'a> {
|
|
graph: &'a RenderGraph,
|
|
node: &'a NodeState,
|
|
inputs: &'a [SlotValue],
|
|
outputs: &'a mut [Option<SlotValue>],
|
|
run_sub_graphs: Vec<RunSubGraph>,
|
|
/// The `view_entity` associated with the render graph being executed
|
|
/// This is optional because you aren't required to have a `view_entity` for a node.
|
|
/// For example, compute shader nodes don't have one.
|
|
/// It should always be set when the [`RenderGraph`] is running on a View.
|
|
view_entity: Option<Entity>,
|
|
}
|
|
|
|
impl<'a> RenderGraphContext<'a> {
|
|
/// Creates a new render graph context for the `node`.
|
|
pub fn new(
|
|
graph: &'a RenderGraph,
|
|
node: &'a NodeState,
|
|
inputs: &'a [SlotValue],
|
|
outputs: &'a mut [Option<SlotValue>],
|
|
) -> Self {
|
|
Self {
|
|
graph,
|
|
node,
|
|
inputs,
|
|
outputs,
|
|
run_sub_graphs: Vec::new(),
|
|
view_entity: None,
|
|
}
|
|
}
|
|
|
|
/// Returns the input slot values for the node.
|
|
#[inline]
|
|
pub fn inputs(&self) -> &[SlotValue] {
|
|
self.inputs
|
|
}
|
|
|
|
/// Returns the [`SlotInfos`] of the inputs.
|
|
pub fn input_info(&self) -> &SlotInfos {
|
|
&self.node.input_slots
|
|
}
|
|
|
|
/// Returns the [`SlotInfos`] of the outputs.
|
|
pub fn output_info(&self) -> &SlotInfos {
|
|
&self.node.output_slots
|
|
}
|
|
|
|
/// Retrieves the input slot value referenced by the `label`.
|
|
pub fn get_input(&self, label: impl Into<SlotLabel>) -> Result<&SlotValue, InputSlotError> {
|
|
let label = label.into();
|
|
let index = self
|
|
.input_info()
|
|
.get_slot_index(label.clone())
|
|
.ok_or(InputSlotError::InvalidSlot(label))?;
|
|
Ok(&self.inputs[index])
|
|
}
|
|
|
|
// TODO: should this return an Arc or a reference?
|
|
/// Retrieves the input slot value referenced by the `label` as a [`TextureView`].
|
|
pub fn get_input_texture(
|
|
&self,
|
|
label: impl Into<SlotLabel>,
|
|
) -> Result<&TextureView, InputSlotError> {
|
|
let label = label.into();
|
|
match self.get_input(label.clone())? {
|
|
SlotValue::TextureView(value) => Ok(value),
|
|
value => Err(InputSlotError::MismatchedSlotType {
|
|
label,
|
|
actual: value.slot_type(),
|
|
expected: SlotType::TextureView,
|
|
}),
|
|
}
|
|
}
|
|
|
|
/// Retrieves the input slot value referenced by the `label` as a [`Sampler`].
|
|
pub fn get_input_sampler(
|
|
&self,
|
|
label: impl Into<SlotLabel>,
|
|
) -> Result<&Sampler, InputSlotError> {
|
|
let label = label.into();
|
|
match self.get_input(label.clone())? {
|
|
SlotValue::Sampler(value) => Ok(value),
|
|
value => Err(InputSlotError::MismatchedSlotType {
|
|
label,
|
|
actual: value.slot_type(),
|
|
expected: SlotType::Sampler,
|
|
}),
|
|
}
|
|
}
|
|
|
|
/// Retrieves the input slot value referenced by the `label` as a [`Buffer`].
|
|
pub fn get_input_buffer(&self, label: impl Into<SlotLabel>) -> Result<&Buffer, InputSlotError> {
|
|
let label = label.into();
|
|
match self.get_input(label.clone())? {
|
|
SlotValue::Buffer(value) => Ok(value),
|
|
value => Err(InputSlotError::MismatchedSlotType {
|
|
label,
|
|
actual: value.slot_type(),
|
|
expected: SlotType::Buffer,
|
|
}),
|
|
}
|
|
}
|
|
|
|
/// Retrieves the input slot value referenced by the `label` as an [`Entity`].
|
|
pub fn get_input_entity(&self, label: impl Into<SlotLabel>) -> Result<Entity, InputSlotError> {
|
|
let label = label.into();
|
|
match self.get_input(label.clone())? {
|
|
SlotValue::Entity(value) => Ok(*value),
|
|
value => Err(InputSlotError::MismatchedSlotType {
|
|
label,
|
|
actual: value.slot_type(),
|
|
expected: SlotType::Entity,
|
|
}),
|
|
}
|
|
}
|
|
|
|
/// Sets the output slot value referenced by the `label`.
|
|
pub fn set_output(
|
|
&mut self,
|
|
label: impl Into<SlotLabel>,
|
|
value: impl Into<SlotValue>,
|
|
) -> Result<(), OutputSlotError> {
|
|
let label = label.into();
|
|
let value = value.into();
|
|
let slot_index = self
|
|
.output_info()
|
|
.get_slot_index(label.clone())
|
|
.ok_or_else(|| OutputSlotError::InvalidSlot(label.clone()))?;
|
|
let slot = self
|
|
.output_info()
|
|
.get_slot(slot_index)
|
|
.expect("slot is valid");
|
|
if value.slot_type() != slot.slot_type {
|
|
return Err(OutputSlotError::MismatchedSlotType {
|
|
label,
|
|
actual: slot.slot_type,
|
|
expected: value.slot_type(),
|
|
});
|
|
}
|
|
self.outputs[slot_index] = Some(value);
|
|
Ok(())
|
|
}
|
|
|
|
pub fn view_entity(&self) -> Entity {
|
|
self.view_entity.unwrap()
|
|
}
|
|
|
|
pub fn get_view_entity(&self) -> Option<Entity> {
|
|
self.view_entity
|
|
}
|
|
|
|
pub fn set_view_entity(&mut self, view_entity: Entity) {
|
|
self.view_entity = Some(view_entity);
|
|
}
|
|
|
|
/// Queues up a sub graph for execution after the node has finished running.
|
|
pub fn run_sub_graph(
|
|
&mut self,
|
|
name: impl RenderSubGraph,
|
|
inputs: Vec<SlotValue>,
|
|
view_entity: Option<Entity>,
|
|
) -> Result<(), RunSubGraphError> {
|
|
let name = name.intern();
|
|
let sub_graph = self
|
|
.graph
|
|
.get_sub_graph(name)
|
|
.ok_or(RunSubGraphError::MissingSubGraph(name))?;
|
|
if let Some(input_node) = sub_graph.get_input_node() {
|
|
for (i, input_slot) in input_node.input_slots.iter().enumerate() {
|
|
if let Some(input_value) = inputs.get(i) {
|
|
if input_slot.slot_type != input_value.slot_type() {
|
|
return Err(RunSubGraphError::MismatchedInputSlotType {
|
|
graph_name: name,
|
|
slot_index: i,
|
|
actual: input_value.slot_type(),
|
|
expected: input_slot.slot_type,
|
|
label: input_slot.name.clone().into(),
|
|
});
|
|
}
|
|
} else {
|
|
return Err(RunSubGraphError::MissingInput {
|
|
slot_index: i,
|
|
slot_name: input_slot.name.clone(),
|
|
graph_name: name,
|
|
});
|
|
}
|
|
}
|
|
} else if !inputs.is_empty() {
|
|
return Err(RunSubGraphError::SubGraphHasNoInputs(name));
|
|
}
|
|
|
|
self.run_sub_graphs.push(RunSubGraph {
|
|
sub_graph: name,
|
|
inputs,
|
|
view_entity,
|
|
});
|
|
|
|
Ok(())
|
|
}
|
|
|
|
/// Returns a human-readable label for this node, for debugging purposes.
|
|
pub fn label(&self) -> Interned<dyn RenderLabel> {
|
|
self.node.label
|
|
}
|
|
|
|
/// Finishes the context for this [`Node`](super::Node) by
|
|
/// returning the sub graphs to run next.
|
|
pub fn finish(self) -> Vec<RunSubGraph> {
|
|
self.run_sub_graphs
|
|
}
|
|
}
|
|
|
|
#[derive(Error, Debug, Eq, PartialEq)]
|
|
pub enum RunSubGraphError {
|
|
#[error("attempted to run sub-graph `{0:?}`, but it does not exist")]
|
|
MissingSubGraph(InternedRenderSubGraph),
|
|
#[error("attempted to pass inputs to sub-graph `{0:?}`, which has no input slots")]
|
|
SubGraphHasNoInputs(InternedRenderSubGraph),
|
|
#[error("sub graph (name: `{graph_name:?}`) could not be run because slot `{slot_name}` at index {slot_index} has no value")]
|
|
MissingInput {
|
|
slot_index: usize,
|
|
slot_name: Cow<'static, str>,
|
|
graph_name: InternedRenderSubGraph,
|
|
},
|
|
#[error("attempted to use the wrong type for input slot")]
|
|
MismatchedInputSlotType {
|
|
graph_name: InternedRenderSubGraph,
|
|
slot_index: usize,
|
|
label: SlotLabel,
|
|
expected: SlotType,
|
|
actual: SlotType,
|
|
},
|
|
}
|
|
|
|
#[derive(Error, Debug, Eq, PartialEq)]
|
|
pub enum OutputSlotError {
|
|
#[error("output slot `{0:?}` does not exist")]
|
|
InvalidSlot(SlotLabel),
|
|
#[error("attempted to output a value of type `{actual}` to output slot `{label:?}`, which has type `{expected}`")]
|
|
MismatchedSlotType {
|
|
label: SlotLabel,
|
|
expected: SlotType,
|
|
actual: SlotType,
|
|
},
|
|
}
|
|
|
|
#[derive(Error, Debug, Eq, PartialEq)]
|
|
pub enum InputSlotError {
|
|
#[error("input slot `{0:?}` does not exist")]
|
|
InvalidSlot(SlotLabel),
|
|
#[error("attempted to retrieve a value of type `{actual}` from input slot `{label:?}`, which has type `{expected}`")]
|
|
MismatchedSlotType {
|
|
label: SlotLabel,
|
|
expected: SlotType,
|
|
actual: SlotType,
|
|
},
|
|
}
|