Commit Graph

7 Commits

Author SHA1 Message Date
andriyDev
f95f42b44a
Allow calling add_render_graph_node on World. (#19912)
# Objective

- This unblocks some work I am doing for #19887.

## Solution

- Rename `RenderGraphApp` to `RenderGraphExt`.
- Implement `RenderGraphExt` for `World`.
- Change `SubApp` and `App` to call the `World` impl.
2025-07-02 14:56:18 +00:00
charlotte 🌸
92e65d5eb1
Upgrade to Rust 1.88 (#19825) 2025-06-26 19:38:19 +00:00
Joona Aalto
7b1c9f192e
Adopt consistent FooSystems naming convention for system sets (#18900)
# Objective

Fixes a part of #14274.

Bevy has an incredibly inconsistent naming convention for its system
sets, both internally and across the ecosystem.

<img alt="System sets in Bevy"
src="https://github.com/user-attachments/assets/d16e2027-793f-4ba4-9cc9-e780b14a5a1b"
width="450" />

*Names of public system set types in Bevy*

Most Bevy types use a naming of `FooSystem` or just `Foo`, but there are
also a few `FooSystems` and `FooSet` types. In ecosystem crates on the
other hand, `FooSet` is perhaps the most commonly used name in general.
Conventions being so wildly inconsistent can make it harder for users to
pick names for their own types, to search for system sets on docs.rs, or
to even discern which types *are* system sets.

To reign in the inconsistency a bit and help unify the ecosystem, it
would be good to establish a common recommended naming convention for
system sets in Bevy itself, similar to how plugins are commonly suffixed
with `Plugin` (ex: `TimePlugin`). By adopting a consistent naming
convention in first-party Bevy, we can softly nudge ecosystem crates to
follow suit (for types where it makes sense to do so).

Choosing a naming convention is also relevant now, as the [`bevy_cli`
recently adopted
lints](https://github.com/TheBevyFlock/bevy_cli/pull/345) to enforce
naming for plugins and system sets, and the recommended naming used for
system sets is still a bit open.

## Which Name To Use?

Now the contentious part: what naming convention should we actually
adopt?

This was discussed on the Bevy Discord at the end of last year, starting
[here](<https://discord.com/channels/691052431525675048/692572690833473578/1310659954683936789>).
`FooSet` and `FooSystems` were the clear favorites, with `FooSet` very
narrowly winning an unofficial poll. However, it seems to me like the
consensus was broadly moving towards `FooSystems` at the end and after
the poll, with Cart
([source](https://discord.com/channels/691052431525675048/692572690833473578/1311140204974706708))
and later Alice
([source](https://discord.com/channels/691052431525675048/692572690833473578/1311092530732859533))
and also me being in favor of it.

Let's do a quick pros and cons list! Of course these are just what I
thought of, so take it with a grain of salt.

`FooSet`:

- Pro: Nice and short!
- Pro: Used by many ecosystem crates.
- Pro: The `Set` suffix comes directly from the trait name `SystemSet`.
- Pro: Pairs nicely with existing APIs like `in_set` and
`configure_sets`.
- Con: `Set` by itself doesn't actually indicate that it's related to
systems *at all*, apart from the implemented trait. A set of what?
- Con: Is `FooSet` a set of `Foo`s or a system set related to `Foo`? Ex:
`ContactSet`, `MeshSet`, `EnemySet`...

`FooSystems`:

- Pro: Very clearly indicates that the type represents a collection of
systems. The actual core concept, system(s), is in the name.
- Pro: Parallels nicely with `FooPlugins` for plugin groups.
- Pro: Low risk of conflicts with other names or misunderstandings about
what the type is.
- Pro: In most cases, reads *very* nicely and clearly. Ex:
`PhysicsSystems` and `AnimationSystems` as opposed to `PhysicsSet` and
`AnimationSet`.
- Pro: Easy to search for on docs.rs.
- Con: Usually results in longer names.
- Con: Not yet as widely used.

Really the big problem with `FooSet` is that it doesn't actually
describe what it is. It describes what *kind of thing* it is (a set of
something), but not *what it is a set of*, unless you know the type or
check its docs or implemented traits. `FooSystems` on the other hand is
much more self-descriptive in this regard, at the cost of being a bit
longer to type.

Ultimately, in some ways it comes down to preference and how you think
of system sets. Personally, I was originally in favor of `FooSet`, but
have been increasingly on the side of `FooSystems`, especially after
seeing what the new names would actually look like in Avian and now
Bevy. I prefer it because it usually reads better, is much more clearly
related to groups of systems than `FooSet`, and overall *feels* more
correct and natural to me in the long term.

For these reasons, and because Alice and Cart also seemed to share a
preference for it when it was previously being discussed, I propose that
we adopt a `FooSystems` naming convention where applicable.

## Solution

Rename Bevy's system set types to use a consistent `FooSet` naming where
applicable.

- `AccessibilitySystem` → `AccessibilitySystems`
- `GizmoRenderSystem` → `GizmoRenderSystems`
- `PickSet` → `PickingSystems`
- `RunFixedMainLoopSystem` → `RunFixedMainLoopSystems`
- `TransformSystem` → `TransformSystems`
- `RemoteSet` → `RemoteSystems`
- `RenderSet` → `RenderSystems`
- `SpriteSystem` → `SpriteSystems`
- `StateTransitionSteps` → `StateTransitionSystems`
- `RenderUiSystem` → `RenderUiSystems`
- `UiSystem` → `UiSystems`
- `Animation` → `AnimationSystems`
- `AssetEvents` → `AssetEventSystems`
- `TrackAssets` → `AssetTrackingSystems`
- `UpdateGizmoMeshes` → `GizmoMeshSystems`
- `InputSystem` → `InputSystems`
- `InputFocusSet` → `InputFocusSystems`
- `ExtractMaterialsSet` → `MaterialExtractionSystems`
- `ExtractMeshesSet` → `MeshExtractionSystems`
- `RumbleSystem` → `RumbleSystems`
- `CameraUpdateSystem` → `CameraUpdateSystems`
- `ExtractAssetsSet` → `AssetExtractionSystems`
- `Update2dText` → `Text2dUpdateSystems`
- `TimeSystem` → `TimeSystems`
- `AudioPlaySet` → `AudioPlaybackSystems`
- `SendEvents` → `EventSenderSystems`
- `EventUpdates` → `EventUpdateSystems`

A lot of the names got slightly longer, but they are also a lot more
consistent, and in my opinion the majority of them read much better. For
a few of the names I took the liberty of rewording things a bit;
definitely open to any further naming improvements.

There are still also cases where the `FooSystems` naming doesn't really
make sense, and those I left alone. This primarily includes system sets
like `Interned<dyn SystemSet>`, `EnterSchedules<S>`, `ExitSchedules<S>`,
or `TransitionSchedules<S>`, where the type has some special purpose and
semantics.

## Todo

- [x] Should I keep all the old names as deprecated type aliases? I can
do this, but to avoid wasting work I'd prefer to first reach consensus
on whether these renames are even desired.
- [x] Migration guide
- [x] Release notes
2025-05-06 15:18:03 +00:00
Greeble
235257ff62
Fix occlusion culling not respecting device limits (#18974)
The occlusion culling plugin checks for a GPU feature by looking at
`RenderAdapter`. This is wrong - it should be checking `RenderDevice`.
See these notes for background:
https://github.com/bevyengine/bevy/discussions/18973

I don't have any evidence that this was causing any bugs, so right now
it's just a precaution.

## Testing

```
cargo run --example occlusion_culling
```

Tested on Win10/Nvidia across Vulkan, WebGL/Chrome, WebGPU/Chrome.
2025-05-05 18:04:43 +00:00
Patrick Walton
8f36106f9e
Split out the IndirectParametersMetadata into CPU-populated and GPU-populated buffers. (#17863)
The GPU can fill out many of the fields in `IndirectParametersMetadata`
using information it already has:

* `early_instance_count` and `late_instance_count` are always
initialized to zero.

* `mesh_index` is already present in the work item buffer as the
`input_index` of the first work item in each batch.

This patch moves these fields to a separate buffer, the *GPU indirect
parameters metadata* buffer. That way, it avoids having to write them on
CPU during `batch_and_prepare_binned_render_phase`. This effectively
reduces the number of bits that that function must write per mesh from
160 to 64 (in addition to the 64 bits per mesh *instance*).

Additionally, this PR refactors `UntypedPhaseIndirectParametersBuffers`
to add another layer, `MeshClassIndirectParametersBuffers`, which allows
abstracting over the buffers corresponding indexed and non-indexed
meshes. This patch doesn't make much use of this abstraction, but
forthcoming patches will, and it's overall a cleaner approach.

This didn't seem to have much of an effect by itself on
`batch_and_prepare_binned_render_phase` time, but subsequent PRs
dependent on this PR yield roughly a 2× speedup.
2025-02-18 00:53:44 +00:00
Patrick Walton
0ede857103
Build batches across phases in parallel. (#17764)
Currently, invocations of `batch_and_prepare_binned_render_phase` and
`batch_and_prepare_sorted_render_phase` can't run in parallel because
they write to scene-global GPU buffers. After PR #17698,
`batch_and_prepare_binned_render_phase` started accounting for the
lion's share of the CPU time, causing us to be strongly CPU bound on
scenes like Caldera when occlusion culling was on (because of the
overhead of batching for the Z-prepass). Although I eventually plan to
optimize `batch_and_prepare_binned_render_phase`, we can obtain
significant wins now by parallelizing that system across phases.

This commit splits all GPU buffers that
`batch_and_prepare_binned_render_phase` and
`batch_and_prepare_sorted_render_phase` touches into separate buffers
for each phase so that the scheduler will run those phases in parallel.
At the end of batch preparation, we gather the render phases up into a
single resource with a new *collection* phase. Because we already run
mesh preprocessing separately for each phase in order to make occlusion
culling work, this is actually a cleaner separation. For example, mesh
output indices (the unique ID that identifies each mesh instance on GPU)
are now guaranteed to be sequential starting from 0, which will simplify
the forthcoming work to remove them in favor of the compute dispatch ID.

On Caldera, this brings the frame time down to approximately 9.1 ms with
occlusion culling on.

![Screenshot 2025-02-08
210720](https://github.com/user-attachments/assets/44bed500-e323-4786-b40c-828b75bc7d3f)
2025-02-13 00:02:20 +00:00
Patrick Walton
dda97880c4
Implement experimental GPU two-phase occlusion culling for the standard 3D mesh pipeline. (#17413)
*Occlusion culling* allows the GPU to skip the vertex and fragment
shading overhead for objects that can be quickly proved to be invisible
because they're behind other geometry. A depth prepass already
eliminates most fragment shading overhead for occluded objects, but the
vertex shading overhead, as well as the cost of testing and rejecting
fragments against the Z-buffer, is presently unavoidable for standard
meshes. We currently perform occlusion culling only for meshlets. But
other meshes, such as skinned meshes, can benefit from occlusion culling
too in order to avoid the transform and skinning overhead for unseen
meshes.

This commit adapts the same [*two-phase occlusion culling*] technique
that meshlets use to Bevy's standard 3D mesh pipeline when the new
`OcclusionCulling` component, as well as the `DepthPrepass` component,
are present on the camera. It has these steps:

1. *Early depth prepass*: We use the hierarchical Z-buffer from the
previous frame to cull meshes for the initial depth prepass, effectively
rendering only the meshes that were visible in the last frame.

2. *Early depth downsample*: We downsample the depth buffer to create
another hierarchical Z-buffer, this time with the current view
transform.

3. *Late depth prepass*: We use the new hierarchical Z-buffer to test
all meshes that weren't rendered in the early depth prepass. Any meshes
that pass this check are rendered.

4. *Late depth downsample*: Again, we downsample the depth buffer to
create a hierarchical Z-buffer in preparation for the early depth
prepass of the next frame. This step is done after all the rendering, in
order to account for custom phase items that might write to the depth
buffer.

Note that this patch has no effect on the per-mesh CPU overhead for
occluded objects, which remains high for a GPU-driven renderer due to
the lack of `cold-specialization` and retained bins. If
`cold-specialization` and retained bins weren't on the horizon, then a
more traditional approach like potentially visible sets (PVS) or low-res
CPU rendering would probably be more efficient than the GPU-driven
approach that this patch implements for most scenes. However, at this
point the amount of effort required to implement a PVS baking tool or a
low-res CPU renderer would probably be greater than landing
`cold-specialization` and retained bins, and the GPU driven approach is
the more modern one anyway. It does mean that the performance
improvements from occlusion culling as implemented in this patch *today*
are likely to be limited, because of the high CPU overhead for occluded
meshes.

Note also that this patch currently doesn't implement occlusion culling
for 2D objects or shadow maps. Those can be addressed in a follow-up.
Additionally, note that the techniques in this patch require compute
shaders, which excludes support for WebGL 2.

This PR is marked experimental because of known precision issues with
the downsampling approach when applied to non-power-of-two framebuffer
sizes (i.e. most of them). These precision issues can, in rare cases,
cause objects to be judged occluded that in fact are not. (I've never
seen this in practice, but I know it's possible; it tends to be likelier
to happen with small meshes.) As a follow-up to this patch, we desire to
switch to the [SPD-based hi-Z buffer shader from the Granite engine],
which doesn't suffer from these problems, at which point we should be
able to graduate this feature from experimental status. I opted not to
include that rewrite in this patch for two reasons: (1) @JMS55 is
planning on doing the rewrite to coincide with the new availability of
image atomic operations in Naga; (2) to reduce the scope of this patch.

A new example, `occlusion_culling`, has been added. It demonstrates
objects becoming quickly occluded and disoccluded by dynamic geometry
and shows the number of objects that are actually being rendered. Also,
a new `--occlusion-culling` switch has been added to `scene_viewer`, in
order to make it easy to test this patch with large scenes like Bistro.

[*two-phase occlusion culling*]:
https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501

[Aaltonen SIGGRAPH 2015]:

https://www.advances.realtimerendering.com/s2015/aaltonenhaar_siggraph2015_combined_final_footer_220dpi.pdf

[Some literature]:

https://gist.github.com/reduz/c5769d0e705d8ab7ac187d63be0099b5?permalink_comment_id=5040452#gistcomment-5040452

[SPD-based hi-Z buffer shader from the Granite engine]:
https://github.com/Themaister/Granite/blob/master/assets/shaders/post/hiz.comp

## Migration guide

* When enqueuing a custom mesh pipeline, work item buffers are now
created with
`bevy::render::batching::gpu_preprocessing::get_or_create_work_item_buffer`,
not `PreprocessWorkItemBuffers::new`. See the
`specialized_mesh_pipeline` example.

## Showcase

Occlusion culling example:
![Screenshot 2025-01-15
175051](https://github.com/user-attachments/assets/1544f301-68a3-45f8-84a6-7af3ad431258)

Bistro zoomed out, before occlusion culling:
![Screenshot 2025-01-16
185425](https://github.com/user-attachments/assets/5114bbdf-5dec-4de9-b17e-7aa77e7b61ed)

Bistro zoomed out, after occlusion culling:
![Screenshot 2025-01-16
184949](https://github.com/user-attachments/assets/9dd67713-656c-4276-9768-6d261ca94300)

In this scene, occlusion culling reduces the number of meshes Bevy has
to render from 1591 to 585.
2025-01-27 05:02:46 +00:00