The `check_visibility` system currently follows this algorithm:
1. Store all meshes that were visible last frame in the
`PreviousVisibleMeshes` set.
2. Determine which meshes are visible. For each such visible mesh,
remove it from `PreviousVisibleMeshes`.
3. Mark all meshes that remain in `PreviousVisibleMeshes` as invisible.
This algorithm would be correct if the `check_visibility` were the only
system that marked meshes visible. However, it's not: the shadow-related
systems `check_dir_light_mesh_visibility` and
`check_point_light_mesh_visibility` can as well. This results in the
following sequence of events for meshes that are in a shadow map but
*not* visible from a camera:
A. `check_visibility` runs, finds that no camera contains these meshes,
and marks them hidden, which sets the changed flag.
B. `check_dir_light_mesh_visibility` and/or
`check_point_light_mesh_visibility` run, discover that these meshes
are visible in the shadow map, and marks them as visible, again
setting the `ViewVisibility` changed flag.
C. During the extraction phase, the mesh extraction system sees that
`ViewVisibility` is changed and re-extracts the mesh.
This is inefficient and results in needless work during rendering.
This patch fixes the issue in two ways:
* The `check_dir_light_mesh_visibility` and
`check_point_light_mesh_visibility` systems now remove meshes that they
discover from `PreviousVisibleMeshes`.
* Step (3) above has been moved from `check_visibility` to a separate
system, `mark_newly_hidden_entities_invisible`. This system runs after
all visibility-determining systems, ensuring that
`PreviousVisibleMeshes` contains only those meshes that truly became
invisible on this frame.
This fix dramatically improves the performance of [the Caldera
benchmark], when combined with several other patches I've submitted.
[the Caldera benchmark]:
https://github.com/DGriffin91/bevy_caldera_scene
PR #17688 broke motion vector computation, and therefore motion blur,
because it enabled retention of `MeshInputUniform`s, and
`MeshInputUniform`s contain the indices of the previous frame's
transform and the previous frame's skinned mesh joint matrices. On frame
N, if a `MeshInputUniform` is retained on GPU from the previous frame,
the `previous_input_index` and `previous_skin_index` would refer to the
indices for frame N - 2, not the index for frame N - 1.
This patch fixes the problems. It solves these issues in two different
ways, one for transforms and one for skins:
1. To fix transforms, this patch supplies the *frame index* to the
shader as part of the view uniforms, and specifies which frame index
each mesh's previous transform refers to. So, in the situation described
above, the frame index would be N, the previous frame index would be N -
1, and the `previous_input_frame_number` would be N - 2. The shader can
now detect this situation and infer that the mesh has been retained, and
can therefore conclude that the mesh's transform hasn't changed.
2. To fix skins, this patch replaces the explicit `previous_skin_index`
with an invariant that the index of the joints for the current frame and
the index of the joints for the previous frame are the same. This means
that the `MeshInputUniform` never has to be updated even if the skin is
animated. The downside is that we have to copy joint matrices from the
previous frame's buffer to the current frame's buffer in
`extract_skins`.
The rationale behind (2) is that we currently have no mechanism to
detect when joints that affect a skin have been updated, short of
comparing all the transforms and setting a flag for
`extract_meshes_for_gpu_building` to consume, which would regress
performance as we want `extract_skins` and
`extract_meshes_for_gpu_building` to be able to run in parallel.
To test this change, use `cargo run --example motion_blur`.
Currently, the specialized pipeline cache maps a (view entity, mesh
entity) tuple to the retained pipeline for that entity. This causes two
problems:
1. Using the view entity is incorrect, because the view entity isn't
stable from frame to frame.
2. Switching the view entity to a `RetainedViewEntity`, which is
necessary for correctness, significantly regresses performance of
`specialize_material_meshes` and `specialize_shadows` because of the
loss of the fast `EntityHash`.
This patch fixes both problems by switching to a *two-level* hash table.
The outer level of the table maps each `RetainedViewEntity` to an inner
table, which maps each `MainEntity` to its pipeline ID and change tick.
Because we loop over views first and, within that loop, loop over
entities visible from that view, we hoist the slow lookup of the view
entity out of the inner entity loop.
Additionally, this patch fixes a bug whereby pipeline IDs were leaked
when removing the view. We still have a problem with leaking pipeline
IDs for deleted entities, but that won't be fixed until the specialized
pipeline cache is retained.
This patch improves performance of the [Caldera benchmark] from 7.8×
faster than 0.14 to 9.0× faster than 0.14, when applied on top of the
global binding arrays PR, #17898.
[Caldera benchmark]: https://github.com/DGriffin91/bevy_caldera_scene
Currently, Bevy rebuilds the buffer containing all the transforms for
joints every frame, during the extraction phase. This is inefficient in
cases in which many skins are present in the scene and their joints
don't move, such as the Caldera test scene.
To address this problem, this commit switches skin extraction to use a
set of retained GPU buffers with allocations managed by the offset
allocator. I use fine-grained change detection in order to determine
which skins need updating. Note that the granularity is on the level of
an entire skin, not individual joints. Using the change detection at
that level would yield poor performance in common cases in which an
entire skin is animated at once. Also, this patch yields additional
performance from the fact that changing joint transforms no longer
requires the skinned mesh to be re-extracted.
Note that this optimization can be a double-edged sword. In
`many_foxes`, fine-grained change detection regressed the performance of
`extract_skins` by 3.4x. This is because every joint is updated every
frame in that example, so change detection is pointless and is pure
overhead. Because the `many_foxes` workload is actually representative
of animated scenes, this patch includes a heuristic that disables
fine-grained change detection if the number of transformed entities in
the frame exceeds a certain fraction of the total number of joints.
Currently, this threshold is set to 25%. Note that this is a crude
heuristic, because it doesn't distinguish between the number of
transformed *joints* and the number of transformed *entities*; however,
it should be good enough to yield the optimum code path most of the
time.
Finally, this patch fixes a bug whereby skinned meshes are actually
being incorrectly retained if the buffer offsets of the joints of those
skinned meshes changes from frame to frame. To fix this without
retaining skins, we would have to re-extract every skinned mesh every
frame. Doing this was a significant regression on Caldera. With this PR,
by contrast, mesh joints stay at the same buffer offset, so we don't
have to update the `MeshInputUniform` containing the buffer offset every
frame. This also makes PR #17717 easier to implement, because that PR
uses the buffer offset from the previous frame, and the logic for
calculating that is simplified if the previous frame's buffer offset is
guaranteed to be identical to that of the current frame.
On Caldera, this patch reduces the time spent in `extract_skins` from
1.79 ms to near zero. On `many_foxes`, this patch regresses the
performance of `extract_skins` by approximately 10%-25%, depending on
the number of foxes. This has only a small impact on frame rate.
The GPU can fill out many of the fields in `IndirectParametersMetadata`
using information it already has:
* `early_instance_count` and `late_instance_count` are always
initialized to zero.
* `mesh_index` is already present in the work item buffer as the
`input_index` of the first work item in each batch.
This patch moves these fields to a separate buffer, the *GPU indirect
parameters metadata* buffer. That way, it avoids having to write them on
CPU during `batch_and_prepare_binned_render_phase`. This effectively
reduces the number of bits that that function must write per mesh from
160 to 64 (in addition to the 64 bits per mesh *instance*).
Additionally, this PR refactors `UntypedPhaseIndirectParametersBuffers`
to add another layer, `MeshClassIndirectParametersBuffers`, which allows
abstracting over the buffers corresponding indexed and non-indexed
meshes. This patch doesn't make much use of this abstraction, but
forthcoming patches will, and it's overall a cleaner approach.
This didn't seem to have much of an effect by itself on
`batch_and_prepare_binned_render_phase` time, but subsequent PRs
dependent on this PR yield roughly a 2× speedup.
# Objective
Alternative to #17894 that also cleans up the workaround from the
previous version
## Solution
Bump version and remove entry from `typos` config
# Objective
- #17787 removed sweeping of binned render phases from 2D by accident
due to them not using the `BinnedRenderPhasePlugin`.
- Fixes#17885
## Solution
- Schedule `sweep_old_entities` in `QueueSweep` like
`BinnedRenderPhasePlugin` does, but for 2D where that plugin is not
used.
## Testing
Tested with the modified `shader_defs` example in #17885 .
Fixes#17290.
<details>
<summary>Compilation errors before fix</summary>
`cargo clippy --tests --all-features --package bevy_image`:
```rust
error[E0061]: this function takes 7 arguments but 6 arguments were supplied
--> crates/bevy_core_pipeline/src/tonemapping/mod.rs:451:5
|
451 | Image::from_buffer(
| ^^^^^^^^^^^^^^^^^^
...
454 | bytes,
| ----- argument #1 of type `std::string::String` is missing
|
note: associated function defined here
--> /Users/josiahnelson/Desktop/Programming/Rust/bevy/crates/bevy_image/src/image.rs:930:12
|
930 | pub fn from_buffer(
| ^^^^^^^^^^^
help: provide the argument
|
451 | Image::from_buffer(/* std::string::String */, bytes, image_type, CompressedImageFormats::NONE, false, image_sampler, RenderAssetUsages::RENDER_WORLD)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
`cargo clippy --tests --all-features --package bevy_gltf`:
```rust
error[E0560]: struct `bevy_pbr::StandardMaterial` has no field named `specular_channel`
--> crates/bevy_gltf/src/loader.rs:1343:13
|
1343 | specular_channel: specular.specular_channel,
| ^^^^^^^^^^^^^^^^ `bevy_pbr::StandardMaterial` does not have this field
|
= note: available fields are: `emissive_exposure_weight`, `diffuse_transmission`, `diffuse_transmission_channel`, `diffuse_transmission_texture`, `flip_normal_map_y` ... and 9 others
error[E0560]: struct `bevy_pbr::StandardMaterial` has no field named `specular_texture`
--> crates/bevy_gltf/src/loader.rs:1345:13
|
1345 | specular_texture: specular.specular_texture,
| ^^^^^^^^^^^^^^^^ `bevy_pbr::StandardMaterial` does not have this field
|
= note: available fields are: `emissive_exposure_weight`, `diffuse_transmission`, `diffuse_transmission_channel`, `diffuse_transmission_texture`, `flip_normal_map_y` ... and 9 others
error[E0560]: struct `bevy_pbr::StandardMaterial` has no field named `specular_tint_channel`
--> crates/bevy_gltf/src/loader.rs:1351:13
|
1351 | specular_tint_channel: specular.specular_color_channel,
| ^^^^^^^^^^^^^^^^^^^^^ `bevy_pbr::StandardMaterial` does not have this field
|
= note: available fields are: `emissive_exposure_weight`, `diffuse_transmission`, `diffuse_transmission_channel`, `diffuse_transmission_texture`, `flip_normal_map_y` ... and 9 others
error[E0560]: struct `bevy_pbr::StandardMaterial` has no field named `specular_tint_texture`
--> crates/bevy_gltf/src/loader.rs:1353:13
|
1353 | specular_tint_texture: specular.specular_color_texture,
| ^^^^^^^^^^^^^^^^^^^^^ `bevy_pbr::StandardMaterial` does not have this field
|
= note: available fields are: `emissive_exposure_weight`, `diffuse_transmission`, `diffuse_transmission_channel`, `diffuse_transmission_texture`, `flip_normal_map_y` ... and 9 others
```
</details>
# Objective
Fixes#17022
## Solution
Only enable `bevy_gltf/dds` if `bevy_gltf` is already enabled.
## Testing
Tested with empty project
```toml
[dependencies]
bevy = { version = "0.16.0-dev", path = "../bevy", default-features = false, features = [
"dds",
] }
```
### Before
```
 cargo tree --depth 1 -i bevy_gltf
bevy_gltf v0.16.0-dev (/Users/robparrett/src/bevy/crates/bevy_gltf)
└── bevy_internal v0.16.0-dev (/Users/robparrett/src/bevy/crates/bevy_internal)
```
### After
```
 cargo tree --depth 1 -i bevy_gltf
warning: nothing to print.
To find dependencies that require specific target platforms, try to use option `--target all` first, and then narrow your search scope accordingly.
```
# Context
Renaming `Parent` to `ChildOf` in #17247 has been contentious. While
those users concerns are valid (especially around legibility of code
IMO!), @cart [has
decided](https://discord.com/channels/691052431525675048/749335865876021248/1340434322833932430)
to stick with the new name.
> In general this conversation is unsurprising to me, as it played out
essentially the same way when I asked for opinions in my PR. There are
strong opinions on both sides. Everyone is right in their own way.
>
> I chose ChildOf for the following reasons:
>
> 1. I think it derives naturally from the system we have built, the
concepts we have chosen, and how we generally name the types that
implement a trait in Rust. This is the name of the type implementing
Relationship. We are adding that Relationship component to a given
entity (whether it "is" the relationship or "has" the relationship is
kind of immaterial ... we are naming the relationship that it "is" or
"has"). What is the name of the relationship that a child has to its
parent? It is a "child" of the parent of course!
> 2. In general the non-parent/child relationships I've seen in the wild
generally benefit from (or need to) use the naming convention in (1)
(aka calling the Relationship the name of the relationship the entity
has). Many relationships don't have an equivalent to the Parent/Child
name concept.
> 3. I do think we could get away with using (1) for pretty much
everything else and special casing Parent/Children. But by embracing the
naming convention, we help establish that this is in fact a pattern, and
we help prime people to think about these things in a consistent way.
Consistency and predictability is a generally desirable property. And
for something as divisive and polarizing as relationship naming, I think
drawing a hard line in the sand is to the benefit of the community as a
whole.
> 4. I believe the fact that we dont see as much of the XOf naming style
elsewhere is to our benefit. When people see things in that style, they
are primed to think of them as relationships (after some exposure to
Bevy and the ecosystem). I consider this a useful hint.
> 5. Most of the practical confusion from using ChildOf seems to be from
calling the value of the target field we read from the relationship
child_of. The name of the target field should be parent (we could even
consider renaming child_of.0 to child_of.parent for clarity). I suspect
that existing Bevy users renaming their existing code will feel the most
friction here, as this requires a reframing. Imo it is natural and
expected to receive pushback from these users hitting this case.
## Objective
The new documentation doesn't do a particularly good job at quickly
explaining the meaning of each component or how to work with them;
making a tricky migration more painful and slowing down new users as
they learn about some of the most fundamental types in Bevy.
## Solution
1. Clearly explain what each component does in the very first line,
assuming no background knowledge. This is the first relationships that
99% of users will encounter, so explaining that they are relationships
is unhelpful as an introduction.
2. Add doc aliases for the rejected `IsParent`/`IsChild`/`Parent` names,
to improve autocomplete and doc searching.
3. Do some assorted docs cleanup while we're here.
---------
Co-authored-by: Eagster <79881080+ElliottjPierce@users.noreply.github.com>
## Objective
There's no general error for when an entity doesn't exist, and some
methods are going to need one when they get Resultified. The closest
thing is `EntityFetchError`, but that error has a slightly more specific
purpose.
## Solution
- Added `EntityDoesNotExistError`.
- Contains `Entity` and `EntityDoesNotExistDetails`.
- Changed `EntityFetchError` and `QueryEntityError`:
- Changed `NoSuchEntity` variant to wrap `EntityDoesNotExistError` and
renamed the variant to `EntityDoesNotExist`.
- Renamed `EntityFetchError` to `EntityMutableFetchError` to make its
purpose clearer.
- Renamed `TryDespawnError` to `EntityDespawnError` to make it more
general.
- Changed `World::inspect_entity` to return `Result<[ok],
EntityDoesNotExistError>` instead of panicking.
- Changed `World::get_entity` and `WorldEntityFetch::fetch_ref` to
return `Result<[ok], EntityDoesNotExistError>` instead of `Result<[ok],
Entity>`.
- Changed `UnsafeWorldCell::get_entity` to return
`Result<UnsafeEntityCell, EntityDoesNotExistError>` instead of
`Option<UnsafeEntityCell>`.
## Migration Guide
- `World::inspect_entity` now returns `Result<impl Iterator<Item =
&ComponentInfo>, EntityDoesNotExistError>` instead of `impl
Iterator<Item = &ComponentInfo>`.
- `World::get_entity` now returns `EntityDoesNotExistError` as an error
instead of `Entity`. You can still access the entity's ID through the
error's `entity` field.
- `UnsafeWorldCell::get_entity` now returns `Result<UnsafeEntityCell,
EntityDoesNotExistError>` instead of `Option<UnsafeEntityCell>`.
# Objective
Fix panic in `custom_render_phase`.
This example was broken by #17764, but that breakage evolved into a
panic after #17849. This new panic seems to illustrate the problem in a
pretty straightforward way.
```
2025-02-15T00:44:11.833622Z INFO bevy_diagnostic::system_information_diagnostics_plugin::internal: SystemInfo { os: "macOS 15.3 Sequoia", kernel: "24.3.0", cpu: "Apple M4 Max", core_count: "16", memory: "64.0 GiB" }
2025-02-15T00:44:11.908328Z INFO bevy_render::renderer: AdapterInfo { name: "Apple M4 Max", vendor: 0, device: 0, device_type: IntegratedGpu, driver: "", driver_info: "", backend: Metal }
2025-02-15T00:44:12.314930Z INFO bevy_winit::system: Creating new window App (0v1)
thread 'Compute Task Pool (1)' panicked at /Users/me/src/bevy/crates/bevy_ecs/src/system/function_system.rs:216:28:
bevy_render::batching::gpu_preprocessing::batch_and_prepare_sorted_render_phase<custom_render_phase::Stencil3d, custom_render_phase::StencilPipeline> could not access system parameter ResMut<PhaseBatchedInstanceBuffers<Stencil3d, MeshUniform>>
```
## Solution
Add a `SortedRenderPhasePlugin` for the custom phase.
## Testing
`cargo run --example custom_render_phase`
Appending to these vectors is performance-critical in
`batch_and_prepare_binned_render_phase`, so `RawBufferVec`, which
doesn't have the overhead of `encase`, is more appropriate.
The `collect_buffers_for_phase` system tries to reuse these buffers, but
its efforts are stymied by the fact that
`clear_batched_gpu_instance_buffers` clears the containing hash table
and therefore frees the buffers. This patch makes
`clear_batched_gpu_instance_buffers` stop doing that so that the
allocations can be reused.
# Objective
Simplify the API surface by removing duplicated functionality between
`Query` and `QueryState`.
Reduce the amount of `unsafe` code required in `QueryState`.
This is a follow-up to #15858.
## Solution
Move implementations of `Query` methods from `QueryState` to `Query`.
Instead of the original methods being on `QueryState`, with `Query`
methods calling them by passing the individual parameters, the original
methods are now on `Query`, with `QueryState` methods calling them by
constructing a `Query`.
This also adds two `_inner` methods that were missed in #15858:
`iter_many_unique_inner` and `single_inner`.
One goal here is to be able to deprecate and eventually remove many of
the methods on `QueryState`, reducing the overall API surface. (I
expected to do that in this PR, but this change was large enough on its
own!) Now that the `QueryState` methods each consist of a simple
expression like `self.query(world).get_inner(entity)`, a future PR can
deprecate some or all of them with simple migration instructions.
The other goal is to reduce the amount of `unsafe` code. The current
implementation of a read-only method like `QueryState::get` directly
calls the `unsafe fn get_unchecked_manual` and needs to repeat the proof
that `&World` has enough access. With this change, `QueryState::get` is
entirely safe code, with the proof that `&World` has enough access done
by the `query()` method and shared across all read-only operations.
## Future Work
The next step will be to mark the `QueryState` methods as
`#[deprecated]` and migrate callers to the methods on `Query`.
# Objective
Support accessing resources using reflection when using
`FilteredResources` in a dynamic system. This is similar to how
components can be queried using reflection when using
`FilteredEntityRef|Mut`.
## Solution
Change `ReflectResource` from taking `&World` and `&mut World` to taking
`impl Into<FilteredResources>` and `impl Into<FilteredResourcesMut>`,
similar to how `ReflectComponent` takes `impl Into<FilteredEntityRef>`
and `impl Into<FilteredEntityMut>`. There are `From` impls that ensure
code passing `&World` and `&mut World` continues to work as before.
## Migration Guide
If you are manually creating a `ReflectComponentFns` struct, the
`reflect` function now takes `FilteredResources` instead `&World`, and
there is a new `reflect_mut` function that takes `FilteredResourcesMut`.
# Objective
Add reference to reported position space in picking backend docs.
Fixes#17844
## Solution
Add explanatory docs to the implementation notes of each picking
backend.
## Testing
`cargo r -p ci -- doc-check` & `cargo r -p ci -- lints`
# Objective
It is impossible to register a type with `TypeRegistry::register` if the
type is unnameable (in the current scope).
## Solution
Add `TypeRegistry::register_by_val` which mirrors std's `size_of_val`
and friends.
## Testing
There's a doc test (unrelated but there seem to be some pre-existing
broken doc links in `bevy_reflect`).
There was nonsense code in `batch_and_prepare_sorted_render_phase` that
created temporary buffers to add objects to instead of using the correct
ones. I think this was debug code. This commit removes that code in
favor of writing to the actual buffers.
Closes#17846.
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
# Objective
`bevy_assets` has long been unapproachable for contributors and users.
More and better documentation would help that.
We're gradually moving towards globally denying missing docs (#3492)!
However, writing all of the hundreds of missing doc strings in a single
go will be miserable to review.
## Solution
Remove the allow for missing docs temporarily, and then pick some easy
missing doc warnings largely at random to tackle.
Stop when the change set is starting to feel intimidating.
# Objective
Fixes#17851
## Solution
Align the `slider` uniform to 16 bytes by making it a `vec4`.
## Testing
Run the example using:
```
cargo run -p build-wasm-example -- --api webgl2 ui_material
basic-http-server examples/wasm/
```
The `output_index` field is only used in direct mode, and the
`indirect_parameters_index` field is only used in indirect mode.
Consequently, we can combine them into a single field, reducing the size
of `PreprocessWorkItem`, which
`batch_and_prepare_{binned,sorted}_render_phase` must construct every
frame for every mesh instance, from 96 bits to 64 bits.
# Objective
Continuation of #16547.
We do not yet have parallel versions of `par_iter_many` and
`par_iter_many_unique`. It is currently very painful to try and use
parallel iteration over entity lists. Even if a list is not long, each
operation might still be very expensive, and worth parallelizing.
Plus, it has been requested several times!
## Solution
Once again, we implement what we lack!
These parallel iterators collect their input entity list into a
`Vec`/`UniqueEntityVec`, then chunk that over the available threads,
inspired by the original `par_iter`.
Since no order guarantee is given to the caller, we could sort the input
list according to `EntityLocation`, but that would likely only be worth
it for very large entity lists.
There is some duplication which could likely be improved, but I'd like
to leave that for a follow-up.
## Testing
The doc tests on `for_each_init` of `QueryParManyIter` and
`QueryParManyUniqueIter`.
# Objective
While surveying the state of documentation for bevy_assets, I noticed a
few minor issues.
## Solution
Revise the docs to focus on clear explanations of core ideas and
cross-linking related objects.
# Objective
Update typos, fix new typos.
1.29.6 was just released to fix an
[issue](https://github.com/crate-ci/typos/issues/1228) where January's
corrections were not included in the binaries for the last release.
Reminder: typos can be tossed in the monthly [non-critical corrections
issue](https://github.com/crate-ci/typos/issues/1221).
## Solution
I chose to allow `implementors`, because a good argument seems to be
being made [here](https://github.com/crate-ci/typos/issues/1226) and
there is now a PR to address that.
## Discussion
Should I exclude `bevy_mikktspace`?
At one point I think we had an informal policy of "don't mess with
mikktspace until https://github.com/bevyengine/bevy/pull/9050 is merged"
but it doesn't seem like that is likely to be merged any time soon.
I think these particular corrections in mikktspace are fine because
- The same typo mistake seems to have been fixed in that PR
- The entire file containing these corrections was deleted in that PR
## Typo of the Month
correspindong -> corresponding
# Objective
Updates the now inaccurate position docs
Fixes#17832
## Solution
From
`The position of the intersection in the world, if the data is available
from the backend.`
To
`The position reported by the backend, if the data is available.
Position data may be in any space (e.g. World space, Screen space, Local
space), specified by the backend providing it.`
## Testing
uhh reading :)
Currently, invocations of `batch_and_prepare_binned_render_phase` and
`batch_and_prepare_sorted_render_phase` can't run in parallel because
they write to scene-global GPU buffers. After PR #17698,
`batch_and_prepare_binned_render_phase` started accounting for the
lion's share of the CPU time, causing us to be strongly CPU bound on
scenes like Caldera when occlusion culling was on (because of the
overhead of batching for the Z-prepass). Although I eventually plan to
optimize `batch_and_prepare_binned_render_phase`, we can obtain
significant wins now by parallelizing that system across phases.
This commit splits all GPU buffers that
`batch_and_prepare_binned_render_phase` and
`batch_and_prepare_sorted_render_phase` touches into separate buffers
for each phase so that the scheduler will run those phases in parallel.
At the end of batch preparation, we gather the render phases up into a
single resource with a new *collection* phase. Because we already run
mesh preprocessing separately for each phase in order to make occlusion
culling work, this is actually a cleaner separation. For example, mesh
output indices (the unique ID that identifies each mesh instance on GPU)
are now guaranteed to be sequential starting from 0, which will simplify
the forthcoming work to remove them in favor of the compute dispatch ID.
On Caldera, this brings the frame time down to approximately 9.1 ms with
occlusion culling on.

# Objective
Fix unsoundness introduced by #15858. `QueryLens::query()` would hand
out a `Query` with the full `'w` lifetime, and the new `_inner` methods
would let the results outlive the `Query`. This could be used to create
aliasing mutable references, like
```rust
fn bad<'w>(mut lens: QueryLens<'w, EntityMut>, entity: Entity) {
let one: EntityMut<'w> = lens.query().get_inner(entity).unwrap();
let two: EntityMut<'w> = lens.query().get_inner(entity).unwrap();
assert!(one.entity() == two.entity());
}
```
Fixes#17693
## Solution
Restrict the `'world` lifetime in the `Query` returned by
`QueryLens::query()` to `'_`, the lifetime of the borrow of the
`QueryLens`.
The model here is that `Query<'w, 's, D, F>` and `QueryLens<'w, D, F>`
have permission to access their components for the lifetime `'w`. So
going from `&'a mut QueryLens<'w>` to `Query<'w, 'a>` would borrow the
permission only for the `'a` lifetime, but incorrectly give it out for
the full `'w` lifetime.
To handle any cases where users were calling `get_inner()` or
`iter_inner()` on the `Query` and expecting the full `'w` lifetime, we
introduce a new `QueryLens::query_inner()` method. This is only valid
for `ReadOnlyQueryData`, so it may safely hand out a copy of the
permission for the full `'w` lifetime. Since `get_inner()` and
`iter_inner()` were only valid on `ReadOnlyQueryData` prior to #15858,
that should cover any uses that relied on the longer lifetime.
## Migration Guide
Users of `QueryLens::query()` who were calling `get_inner()` or
`iter_inner()` will need to replace the call with
`QueryLens::query_inner()`.
Conceptually, bins are ordered hash maps. We currently implement these
as a list of keys with an associated hash map. But we already have a
data type that implements ordered hash maps directly: `IndexMap`. This
patch switches Bevy to use `IndexMap`s for bins. Because we're memory
bound, this doesn't affect performance much, but it is cleaner.
# Objective
Related to #17784. The ticket is actually about just getting rid of
`Entity{Ref,Mut}Except` in favor of `FilteredEntity{Ref,Mut}`, but I got
told the unification of Entity types is a bigger endeavor that has been
going on for a while now (as the "Pointing Fingers" working group) and I
should just add the functions I actually need in the meantime.
## Solution
This PR adds all of the functions necessary to access components by
TypeId or ComponentId instead of static types.
## Testing
> Did you test these changes? If so, how?
Haven't tested it yet, but the changes are mostly copy/paste from other
implementations in the same file, since there is a lot of duplicated
functionality there.
## Not a Migration Guide
There shouldn't be any breaking changes, it's just a few new functions
on existing types.
I had to shuffle around the lifetimes in `From<&EntityMutExcept<'a, B>>
for EntityRefExcept<'a, B>` (originally it was `From<&'a
EntityMutExcept<'_, B>> for EntityRefExcept<'_, B>`) to make the borrow
checker happy, but I don't think that this should have an impact on user
code (correct me if I'm wrong).
* Use texture atomics rather than buffer atomics for the visbuffer
(haven't tested perf on a raster-heavy scene yet)
* Unfortunately to clear the visbuffer we now need a compute pass to
clear it. Using wgpu's clear_texture function internally uses a buffer
-> image copy that's insanely expensive. Ideally it should be using
vkCmdClearColorImage, which I've opened an issue for
https://github.com/gfx-rs/wgpu/issues/7090. For now we'll have to stick
with a custom compute pass and all the extra code that brings.
* Faster resolve depth pass by discarding 0 depth pixels instead of
redundantly writing zero (2x faster for big depth textures like shadow
views)
# Objective
Tidy up a few little things I noticed while working with this example
## Solution
- Fix manual resetting of a repeating timer
- Use atlas image size instead of hardcoded value. Atlases are always
512x512 right now, but hopefully not in the future.
- Pluralize a variable name for a variable holding a `Vec`
# Objective
I'm working on some PRs involving our font atlases and it would be nice
to be able to test these scenarios separately to better understand the
performance tradeoffs in different situations.
## Solution
Add a `many-font-sizes` option.
The old behavior is still available by running with `--many-glyphs
--many-font-sizes`.
## Testing
`cargo run --example many_text2d --release`
`cargo run --example many_text2d --release -- --many-glyphs`
`cargo run --example many_text2d --release -- --many-font-sizes`
`cargo run --example many_text2d --release -- --many-glyphs
--many-font-sizes`
# Objective
- Wgpu has some expensive code it injects into shaders to avoid the
possibility of things like infinite loops. Generally our shaders are
written by users who won't do this, so it just makes our shaders perform
worse.
## Solution
- Turn off the checks.
- We could try to conditionally keep them, but that complicates the code
and 99.9% of users won't want this.
## Migration Guide
- Bevy no longer turns on wgpu's runtime safety checks
https://docs.rs/wgpu/latest/wgpu/struct.ShaderRuntimeChecks.html. If you
were using Bevy with untrusted shaders, please file an issue.
---------
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
# Objective
Continuation of #17589 and #16547.
Slices have several methods that return iterators which themselves yield
slices, which we have not yet implemented.
An example use is `par_iter_many` style logic.
## Solution
Their implementation is rather straightforward, we simply delegate all
impls to `[T]`.
The resulting iterator types need their own wrappers in the form of
`UniqueEntitySliceIter` and `UniqueEntitySliceIterMut`.
We also add three free functions that cast slices of entity slices to
slices of `UniqueEntitySlice`.
These three should be sufficient, though infinite nesting is achievable
with a trait (like `TrustedEntityBorrow` works over infinite reference
nesting), should the need ever arise.
# Objective
Fixes#17810
Y'all picked the option I already implemented, yay.
## Solution
Add a system that panics if the load state of an asset is `Failed`.
## Testing
`cargo run --example scene`
- Tested with valid scene file
- Introduced a syntax error in the scene file
- Deleted the scene file
## Objective
Get rid of a redundant Cargo feature flag.
## Solution
Use the built-in `target_abi = "sim"` instead of a custom Cargo feature
flag, which is set for the iOS (and visionOS and tvOS) simulator. This
has been stable since Rust 1.78.
In the future, some of this may become redundant if Wgpu implements
proper supper for the iOS Simulator:
https://github.com/gfx-rs/wgpu/issues/7057
CC @mockersf who implemented [the original
fix](https://github.com/bevyengine/bevy/pull/10178).
## Testing
- Open mobile example in Xcode.
- Launch the simulator.
- See that no errors are emitted.
- Remove the code cfg-guarded behind `target_abi = "sim"`.
- See that an error now happens.
(I haven't actually performed these steps on the latest `main`, because
I'm hitting an unrelated error (EDIT: It was
https://github.com/bevyengine/bevy/pull/17637). But tested it on
0.15.0).
---
## Migration Guide
> If you're using a project that builds upon the mobile example, remove
the `ios_simulator` feature from your `Cargo.toml` (Bevy now handles
this internally).
Currently, we look up each `MeshInputUniform` index in a hash table that
maps the main entity ID to the index every frame. This is inefficient,
cache unfriendly, and unnecessary, as the `MeshInputUniform` index for
an entity remains the same from frame to frame (even if the input
uniform changes). This commit changes the `IndexSet` in the `RenderBin`
to an `IndexMap` that maps the `MainEntity` to `MeshInputUniformIndex`
(a new type that this patch adds for more type safety).
On Caldera with parallel `batch_and_prepare_binned_render_phase`, this
patch improves that function from 3.18 ms to 2.42 ms, a 31% speedup.
Currently, when a mesh slab overflows, we recreate the allocator and
reinsert all the meshes that were in it in an arbitrary order. This can
result in the meshes moving around. Before `MeshInputUniform`s were
retained, this was slow but harmless, because the `MeshInputUniform`
that contained the positions of the vertex and index data in the slab
would be recreated every frame. However, with mesh retention, there's no
guarantee that the `MeshInputUniform`, which could be cached from the
previous frame, will reflect the new position of the mesh data within
the buffer if that buffer happened to grow. This manifested itself as
seeming mesh data corruption when adding many meshes dynamically to the
scene.
There are three possible ways that I could have fixed this that I can
see:
1. Invalidate and rebuild all the `MeshInputUniform`s belonging to all
meshes in a slab when that mesh grows.
2. Introduce a second layer of indirection so that the
`MeshInputUniform` points to a *mesh allocation table* that contains the
current locations of the data of each mesh.
3. Avoid moving meshes when reallocating the buffer.
To be efficient, option (1) would require scanning meshes to see if
their positions changed, a la
`mark_meshes_as_changed_if_their_materials_changed`. Option (2) would
add more runtime indirection and would require additional bookkeeping on
the part of the allocator.
Therefore, this PR chooses option (3), which was remarkably simple to
implement. The key is that the offset allocator happens to allocate
addresses from low addresses to high addresses. So all we have to do is
to *conceptually* allocate the full 512 MiB mesh slab as far as the
offset allocator is concerned, and grow the underlying backing store
from 1 MiB to 512 MiB as needed. In other words, the allocator now
allocates *virtual* GPU memory, and the actual backing slab resizes to
fit the virtual memory. This ensures that the location of mesh data
remains constant for the lifetime of the mesh asset, and we can remove
the code that reinserts meshes one by one when the slab grows in favor
of a single buffer copy.
Closes#17766.
# Objective
- Fixes#17797
## Solution
- `mesh` in `bevy_pbr::mesh_bindings` is behind a `ifndef
MESHLET_MESH_MATERIAL_PASS`. also gate `get_tag` which uses this `mesh`
## Testing
- Run the meshlet example
# Objective
Allow switching through available Tonemapping algorithms on `bloom_2d`
example to compare between them
## Solution
Add a resource to `bloom_2d` that holds current tonemapping algorithm, a
method to get the next one, and a check of key press to make the switch
## Testing
Ran `bloom_2d` example with modified code
## Showcase
https://github.com/user-attachments/assets/920b2d6a-b237-4b19-be9d-9b651b4dc913
Note: Sprite flashing is already described in #17763
# Objective
After #16894, this example started logging errors:
```
ERROR bevy_asset::server: Failed to load asset 'scenes/load_scene_example.scn.ron' with asset loader 'bevy_scene::scene_loader::SceneLoader': Could not parse RON: 10:33: Expected string
```
Fixes#17798, this is the only actionable/unreported issue in there as
far as I can tell.
## Solution
Update the serialized scene with the expected format for `Name`
## Testing
`cargo run --example scene`
## Discussion
This example breaks very often and we don't always catch it. It might be
nice to have this scene either
1. produce visual output so that it can be checked
2. panic if the scene fails to load (check for LoadState::Failed)
Either of those would make the failures visible in [the example
report](https://thebevyflock.github.io/bevy-example-runner/). Not sure
which method would best suit the example.
# Objective
Fix gltf validation errors in `Fox.glb`.
Inspired by #8099, but that issue doesn't appear to describe a real bug
to fix, as far as I can tell.
## Solution
Use the latest version of the Fox from
[glTF-Sample-Assets](https://github.com/KhronosGroup/glTF-Sample-Assets/blob/main/Models/Fox/glTF-Binary/Fox.glb).
## Testing
Dropped both versions in https://github.khronos.org/glTF-Validator/
`cargo run --example animated_mesh` seems to still look fine.
Before:
```
The asset contains errors.
"numErrors": 126,
"numWarnings": 4184,
```
After:
```
The asset is valid.
"numErrors": 0,
"numWarnings": 0,
```
## Discussion
The 3d testbed was panicking with
```
thread 'main' panicked at examples/testbed/3d.rs:288:60:
called `Result::unwrap()` on an `Err` value: QueryDoesNotMatch(35v1 with components Transform, GlobalTransform, Visibility, InheritedVisibility, ViewVisibility, ChildOf, Children, Name)
```
Which is bizarre. I think this might be related to #17720, or maybe the
structure of the gltf changed.
I fixed it by using updating the testbed to use a more robust method of
finding the correct entity as is done in `animated_mesh`.
# Objective
- In #17743, attention was raised to the fact that we supported an
unusual kind of step easing function. The author of the fix kindly
provided some links to standards used in CSS. It would be desirable to
support generally agreed upon standards so this PR here tries to
implement an extra configuration option of the step easing function
- Resolve#17744
## Solution
- Introduce `StepConfig`
- `StepConfig` can configure both the number of steps and the jumping
behavior of the function
- `StepConfig` replaces the raw `usize` parameter of the
`EasingFunction::Steps(usize)` construct.
- `StepConfig`s default jumping behavior is `end`, so in that way it
follows #17743
## Testing
- I added a new test per `JumpAt` jumping behavior. These tests
replicate the visuals that can be found at
https://developer.mozilla.org/en-US/docs/Web/CSS/easing-function/steps#description
## Migration Guide
- `EasingFunction::Steps` now uses a `StepConfig` instead of a raw
`usize`. You can replicate the previous behavior by replaceing
`EasingFunction::Steps(10)` with
`EasingFunction::Steps(StepConfig::new(10))`.
---------
Co-authored-by: François Mockers <francois.mockers@vleue.com>
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
## Objective
There's no need for the `span_index` and `color` variables in
`extract_text_shadows` and `extract_text_sections` and we can remove one
of the span index comparisons since text colors are only set per
section.
## Testing
<img width="454" alt="trace"
src="https://github.com/user-attachments/assets/3109d1df-0817-46c2-9889-0459ac93a42c"
/>