# Objective
In the Render World, there are a number of collections that are derived
from Main World entities and are used to drive rendering. The most
notable are:
- `VisibleEntities`, which is generated in the `check_visibility` system
and contains visible entities for a view.
- `ExtractedInstances`, which maps entity ids to asset ids.
In the old model, these collections were trivially kept in sync -- any
extracted phase item could look itself up because the render entity id
was guaranteed to always match the corresponding main world id.
After #15320, this became much more complicated, and was leading to a
number of subtle bugs in the Render World. The main rendering systems,
i.e. `queue_material_meshes` and `queue_material2d_meshes`, follow a
similar pattern:
```rust
for visible_entity in visible_entities.iter::<With<Mesh2d>>() {
let Some(mesh_instance) = render_mesh_instances.get_mut(visible_entity) else {
continue;
};
// Look some more stuff up and specialize the pipeline...
let bin_key = Opaque2dBinKey {
pipeline: pipeline_id,
draw_function: draw_opaque_2d,
asset_id: mesh_instance.mesh_asset_id.into(),
material_bind_group_id: material_2d.get_bind_group_id().0,
};
opaque_phase.add(
bin_key,
*visible_entity,
BinnedRenderPhaseType::mesh(mesh_instance.automatic_batching),
);
}
```
In this case, `visible_entities` and `render_mesh_instances` are both
collections that are created and keyed by Main World entity ids, and so
this lookup happens to work by coincidence. However, there is a major
unintentional bug here: namely, because `visible_entities` is a
collection of Main World ids, the phase item being queued is created
with a Main World id rather than its correct Render World id.
This happens to not break mesh rendering because the render commands
used for drawing meshes do not access the `ItemQuery` parameter, but
demonstrates the confusion that is now possible: our UI phase items are
correctly being queued with Render World ids while our meshes aren't.
Additionally, this makes it very easy and error prone to use the wrong
entity id to look up things like assets. For example, if instead we
ignored visibility checks and queued our meshes via a query, we'd have
to be extra careful to use `&MainEntity` instead of the natural
`Entity`.
## Solution
Make all collections that are derived from Main World data use
`MainEntity` as their key, to ensure type safety and avoid accidentally
looking up data with the wrong entity id:
```rust
pub type MainEntityHashMap<V> = hashbrown::HashMap<MainEntity, V, EntityHash>;
```
Additionally, we make all `PhaseItem` be able to provide both their Main
and Render World ids, to allow render phase implementors maximum
flexibility as to what id should be used to look up data.
You can think of this like tracking at the type level whether something
in the Render World should use it's "primary key", i.e. entity id, or
needs to use a foreign key, i.e. `MainEntity`.
## Testing
##### TODO:
This will require extensive testing to make sure things didn't break!
Additionally, some extraction logic has become more complicated and
needs to be checked for regressions.
## Migration Guide
With the advent of the retained render world, collections that contain
references to `Entity` that are extracted into the render world have
been changed to contain `MainEntity` in order to prevent errors where a
render world entity id is used to look up an item by accident. Custom
rendering code may need to be changed to query for `&MainEntity` in
order to look up the correct item from such a collection. Additionally,
users who implement their own extraction logic for collections of main
world entity should strongly consider extracting into a different
collection that uses `MainEntity` as a key.
Additionally, render phases now require specifying both the `Entity` and
`MainEntity` for a given `PhaseItem`. Custom render phases should ensure
`MainEntity` is available when queuing a phase item.
# Objective
`EntityHash` and related types were moved from `bevy_utils` to
`bevy_ecs` in #11498, but seemed to have been accidentally reintroduced
a week later in #11707.
## Solution
Remove the old leftover code.
---
## Migration Guide
- Uses of `bevy::utils::{EntityHash, EntityHasher, EntityHashMap,
EntityHashSet}` now have to be imported from `bevy::ecs::entity`.
# Objective
The Android example on Adreno 642L currently crashes on startup.
Previous PRs #14176 and #13323 have adressed this specific crash
occurring on some Adreno GPUs, that fix works as it should but isn't
applied when to the GPU name contains a suffix like in the case of
`642L`.
## Solution
- Amending the logic to filter out any parts of the GPU name not
containing digits thus enabling the fix on `642L`.
## Testing
- Ran the Android example on a Nothing Phone 1. Before this change it
crashed, after it works as intended.
---------
Co-authored-by: Sam Pettersson <sam.pettersson@geoguessr.com>
# Objective
- Fix#14295
## Solution
- Early out when `GFBD::get_index_and_compare_data` returns None.
## Testing
- Tested on a selection of examples including `many_foxes` and
`3d_shapes`.
- Resolved the original issue in `bevy_vector_shapes`.
This commit uses the [`offset-allocator`] crate to combine vertex and
index arrays from different meshes into single buffers. Since the
primary source of `wgpu` overhead is from validation and synchronization
when switching buffers, this significantly improves Bevy's rendering
performance on many scenes.
This patch is a more flexible version of #13218, which also used slabs.
Unlike #13218, which used slabs of a fixed size, this commit implements
slabs that start small and can grow. In addition to reducing memory
usage, supporting slab growth reduces the number of vertex and index
buffer switches that need to happen during rendering, leading to
improved performance. To prevent pathological fragmentation behavior,
slabs are capped to a maximum size, and mesh arrays that are too large
get their own dedicated slabs.
As an additional improvement over #13218, this commit allows the
application to customize all allocator heuristics. The
`MeshAllocatorSettings` resource contains values that adjust the minimum
and maximum slab sizes, the cutoff point at which meshes get their own
dedicated slabs, and the rate at which slabs grow. Hopefully-sensible
defaults have been chosen for each value.
Unfortunately, WebGL 2 doesn't support the *base vertex* feature, which
is necessary to pack vertex arrays from different meshes into the same
buffer. `wgpu` represents this restriction as the downlevel flag
`BASE_VERTEX`. This patch detects that bit and ensures that all vertex
buffers get dedicated slabs on that platform. Even on WebGL 2, though,
we can combine all *index* arrays into single buffers to reduce buffer
changes, and we do so.
The following measurements are on Bistro:
Overall frame time improves from 8.74 ms to 5.53 ms (1.58x speedup):

Render system time improves from 6.57 ms to 3.54 ms (1.86x speedup):

Opaque pass time improves from 4.64 ms to 2.33 ms (1.99x speedup):

## Migration Guide
### Changed
* Vertex and index buffers for meshes may now be packed alongside other
buffers, for performance.
* `GpuMesh` has been renamed to `RenderMesh`, to reflect the fact that
it no longer directly stores handles to GPU objects.
* Because meshes no longer have their own vertex and index buffers, the
responsibility for the buffers has moved from `GpuMesh` (now called
`RenderMesh`) to the `MeshAllocator` resource. To access the vertex data
for a mesh, use `MeshAllocator::mesh_vertex_slice`. To access the index
data for a mesh, use `MeshAllocator::mesh_index_slice`.
[`offset-allocator`]: https://github.com/pcwalton/offset-allocator
# Objective
Fix#14146
## Solution
Expansion of #13323 , excluded Adreno 730 and earlier.
## Testing
Tested on android device(Adreno 730) that used to crash
As reported in #14004, many third-party plugins, such as Hanabi, enqueue
entities that don't have meshes into render phases. However, the
introduction of indirect mode added a dependency on mesh-specific data,
breaking this workflow. This is because GPU preprocessing requires that
the render phases manage indirect draw parameters, which don't apply to
objects that aren't meshes. The existing code skips over binned entities
that don't have indirect draw parameters, which causes the rendering to
be skipped for such objects.
To support this workflow, this commit adds a new field,
`non_mesh_items`, to `BinnedRenderPhase`. This field contains a simple
list of (bin key, entity) pairs. After drawing batchable and unbatchable
objects, the non-mesh items are drawn one after another. Bevy itself
doesn't enqueue any items into this list; it exists solely for the
application and/or plugins to use.
Additionally, this commit switches the asset ID in the standard bin keys
to be an untyped asset ID rather than that of a mesh. This allows more
flexibility, allowing bins to be keyed off any type of asset.
This patch adds a new example, `custom_phase_item`, which simultaneously
serves to demonstrate how to use this new feature and to act as a
regression test so this doesn't break again.
Fixes#14004.
## Changelog
### Added
* `BinnedRenderPhase` now contains a `non_mesh_items` field for plugins
to add custom items to.
# Objective
- Fixes#13728
## Solution
- add a new feature `smaa_luts`. if enables, it also enables `ktx2` and
`zstd`. if not, it doesn't load the files but use placeholders instead
- adds all the resources needed in the same places that system that uses
them are added.
# Objective
- Fixes#13038
## Solution
- Disable gpu preprocessing when feature
`SAMPLED_TEXTURE_AND_STORAGE_BUFFER_ARRAY_NON_UNIFORM_INDEXING` is not
available
## Testing
- Tested on android device that used to crash
This commit makes us stop using the render world ECS for
`BinnedRenderPhase` and `SortedRenderPhase` and instead use resources
with `EntityHashMap`s inside. There are three reasons to do this:
1. We can use `clear()` to clear out the render phase collections
instead of recreating the components from scratch, allowing us to reuse
allocations.
2. This is a prerequisite for retained bins, because components can't be
retained from frame to frame in the render world, but resources can.
3. We want to move away from storing anything in components in the
render world ECS, and this is a step in that direction.
This patch results in a small performance benefit, due to point (1)
above.
## Changelog
### Changed
* The `BinnedRenderPhase` and `SortedRenderPhase` render world
components have been replaced with `ViewBinnedRenderPhases` and
`ViewSortedRenderPhases` resources.
## Migration Guide
* The `BinnedRenderPhase` and `SortedRenderPhase` render world
components have been replaced with `ViewBinnedRenderPhases` and
`ViewSortedRenderPhases` resources. Instead of querying for the
components, look the camera entity up in the
`ViewBinnedRenderPhases`/`ViewSortedRenderPhases` tables.
This is an adoption of #12670 plus some documentation fixes. See that PR
for more details.
---
## Changelog
* Renamed `BufferVec` to `RawBufferVec` and added a new `BufferVec`
type.
## Migration Guide
`BufferVec` has been renamed to `RawBufferVec` and a new similar type
has taken the `BufferVec` name.
---------
Co-authored-by: Patrick Walton <pcwalton@mimiga.net>
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: IceSentry <IceSentry@users.noreply.github.com>
In #12889, I mistakenly started dropping unbatchable sorted items on the
floor instead of giving them solitary batches. This caused the objects
in the `shader_instancing` demo to stop showing up. This patch fixes the
issue by giving those items their own batches as expected.
Fixes#13130.
This commit implements opt-in GPU frustum culling, built on top of the
infrastructure in https://github.com/bevyengine/bevy/pull/12773. To
enable it on a camera, add the `GpuCulling` component to it. To
additionally disable CPU frustum culling, add the `NoCpuCulling`
component. Note that adding `GpuCulling` without `NoCpuCulling`
*currently* does nothing useful. The reason why `GpuCulling` doesn't
automatically imply `NoCpuCulling` is that I intend to follow this patch
up with GPU two-phase occlusion culling, and CPU frustum culling plus
GPU occlusion culling seems like a very commonly-desired mode.
Adding the `GpuCulling` component to a view puts that view into
*indirect mode*. This mode makes all drawcalls indirect, relying on the
mesh preprocessing shader to allocate instances dynamically. In indirect
mode, the `PreprocessWorkItem` `output_index` points not to a
`MeshUniform` instance slot but instead to a set of `wgpu`
`IndirectParameters`, from which it allocates an instance slot
dynamically if frustum culling succeeds. Batch building has been updated
to allocate and track indirect parameter slots, and the AABBs are now
supplied to the GPU as `MeshCullingData`.
A small amount of code relating to the frustum culling has been borrowed
from meshlets and moved into `maths.wgsl`. Note that standard Bevy
frustum culling uses AABBs, while meshlets use bounding spheres; this
means that not as much code can be shared as one might think.
This patch doesn't provide any way to perform GPU culling on shadow
maps, to avoid making this patch bigger than it already is. That can be
a followup.
## Changelog
### Added
* Frustum culling can now optionally be done on the GPU. To enable it,
add the `GpuCulling` component to a camera.
* To disable CPU frustum culling, add `NoCpuCulling` to a camera. Note
that `GpuCulling` doesn't automatically imply `NoCpuCulling`.
# Objective
- `cargo run --release --example bevymark -- --benchmark --waves 160
--per-wave 1000 --mode mesh2d` runs slower and slower over time due to
`no_gpu_preprocessing::write_batched_instance_buffer<bevy_sprite::mesh2d::mesh::Mesh2dPipeline>`
taking longer and longer because the `BatchedInstanceBuffer` is not
cleared
## Solution
- Split the `clear_batched_instance_buffers` system into CPU and GPU
versions
- Use the CPU version for 2D meshes
Currently, `MeshUniform`s are rather large: 160 bytes. They're also
somewhat expensive to compute, because they involve taking the inverse
of a 3x4 matrix. Finally, if a mesh is present in multiple views, that
mesh will have a separate `MeshUniform` for each and every view, which
is wasteful.
This commit fixes these issues by introducing the concept of a *mesh
input uniform* and adding a *mesh uniform building* compute shader pass.
The `MeshInputUniform` is simply the minimum amount of data needed for
the GPU to compute the full `MeshUniform`. Most of this data is just the
transform and is therefore only 64 bytes. `MeshInputUniform`s are
computed during the *extraction* phase, much like skins are today, in
order to avoid needlessly copying transforms around on CPU. (In fact,
the render app has been changed to only store the translation of each
mesh; it no longer cares about any other part of the transform, which is
stored only on the GPU and the main world.) Before rendering, the
`build_mesh_uniforms` pass runs to expand the `MeshInputUniform`s to the
full `MeshUniform`.
The mesh uniform building pass does the following, all on GPU:
1. Copy the appropriate fields of the `MeshInputUniform` to the
`MeshUniform` slot. If a single mesh is present in multiple views, this
effectively duplicates it into each view.
2. Compute the inverse transpose of the model transform, used for
transforming normals.
3. If applicable, copy the mesh's transform from the previous frame for
TAA. To support this, we double-buffer the `MeshInputUniform`s over two
frames and swap the buffers each frame. The `MeshInputUniform`s for the
current frame contain the index of that mesh's `MeshInputUniform` for
the previous frame.
This commit produces wins in virtually every CPU part of the pipeline:
`extract_meshes`, `queue_material_meshes`,
`batch_and_prepare_render_phase`, and especially
`write_batched_instance_buffer` are all faster. Shrinking the amount of
CPU data that has to be shuffled around speeds up the entire rendering
process.
| Benchmark | This branch | `main` | Speedup |
|------------------------|-------------|---------|---------|
| `many_cubes -nfc` | 17.259 | 24.529 | 42.12% |
| `many_cubes -nfc -vpi` | 302.116 | 312.123 | 3.31% |
| `many_foxes` | 3.227 | 3.515 | 8.92% |
Because mesh uniform building requires compute shader, and WebGL 2 has
no compute shader, the existing CPU mesh uniform building code has been
left as-is. Many types now have both CPU mesh uniform building and GPU
mesh uniform building modes. Developers can opt into the old CPU mesh
uniform building by setting the `use_gpu_uniform_builder` option on
`PbrPlugin` to `false`.
Below are graphs of the CPU portions of `many-cubes
--no-frustum-culling`. Yellow is this branch, red is `main`.
`extract_meshes`:

It's notable that we get a small win even though we're now writing to a
GPU buffer.
`queue_material_meshes`:

There's a bit of a regression here; not sure what's causing it. In any
case it's very outweighed by the other gains.
`batch_and_prepare_render_phase`:

There's a huge win here, enough to make batching basically drop off the
profile.
`write_batched_instance_buffer`:

There's a massive improvement here, as expected. Note that a lot of it
simply comes from the fact that `MeshInputUniform` is `Pod`. (This isn't
a maintainability problem in my view because `MeshInputUniform` is so
simple: just 16 tightly-packed words.)
## Changelog
### Added
* Per-mesh instance data is now generated on GPU with a compute shader
instead of CPU, resulting in rendering performance improvements on
platforms where compute shaders are supported.
## Migration guide
* Custom render phases now need multiple systems beyond just
`batch_and_prepare_render_phase`. Code that was previously creating
custom render phases should now add a `BinnedRenderPhasePlugin` or
`SortedRenderPhasePlugin` as appropriate instead of directly adding
`batch_and_prepare_render_phase`.