# Objective
Calling `define_label!` in a `no_std` 3rd party crate currently requires
the user to import `Box` themselves due to a non-fully-specified
reference to `Box`.
## Solution
Add a fully specified path for `Box` in the one location necessary, to
match all of the other cases.
Two-phase occlusion culling can be helpful for shadow maps just as it
can for a prepass, in order to reduce vertex and alpha mask fragment
shading overhead. This patch implements occlusion culling for shadow
maps from directional lights, when the `OcclusionCulling` component is
present on the entities containing the lights. Shadow maps from point
lights are deferred to a follow-up patch. Much of this patch involves
expanding the hierarchical Z-buffer to cover shadow maps in addition to
standard view depth buffers.
The `scene_viewer` example has been updated to add `OcclusionCulling` to
the directional light that it creates.
This improved the performance of the rend3 sci-fi test scene when
enabling shadows.
Currently, Bevy's implementation of bindless resources is rather
unusual: every binding in an object that implements `AsBindGroup` (most
commonly, a material) becomes its own separate binding array in the
shader. This is inefficient for two reasons:
1. If multiple materials reference the same texture or other resource,
the reference to that resource will be duplicated many times. This
increases `wgpu` validation overhead.
2. It creates many unused binding array slots. This increases `wgpu` and
driver overhead and makes it easier to hit limits on APIs that `wgpu`
currently imposes tight resource limits on, like Metal.
This PR fixes these issues by switching Bevy to use the standard
approach in GPU-driven renderers, in which resources are de-duplicated
and passed as global arrays, one for each type of resource.
Along the way, this patch introduces per-platform resource limits and
bumps them from 16 resources per binding array to 64 resources per bind
group on Metal and 2048 resources per bind group on other platforms.
(Note that the number of resources per *binding array* isn't the same as
the number of resources per *bind group*; as it currently stands, if all
the PBR features are turned on, Bevy could pack as many as 496 resources
into a single slab.) The limits have been increased because `wgpu` now
has universal support for partially-bound binding arrays, which mean
that we no longer need to fill the binding arrays with fallback
resources on Direct3D 12. The `#[bindless(LIMIT)]` declaration when
deriving `AsBindGroup` can now simply be written `#[bindless]` in order
to have Bevy choose a default limit size for the current platform.
Custom limits are still available with the new
`#[bindless(limit(LIMIT))]` syntax: e.g. `#[bindless(limit(8))]`.
The material bind group allocator has been completely rewritten. Now
there are two allocators: one for bindless materials and one for
non-bindless materials. The new non-bindless material allocator simply
maintains a 1:1 mapping from material to bind group. The new bindless
material allocator maintains a list of slabs and allocates materials
into slabs on a first-fit basis. This unfortunately makes its
performance O(number of resources per object * number of slabs), but the
number of slabs is likely to be low, and it's planned to become even
lower in the future with `wgpu` improvements. Resources are
de-duplicated with in a slab and reference counted. So, for instance, if
multiple materials refer to the same texture, that texture will exist
only once in the appropriate binding array.
To support these new features, this patch adds the concept of a
*bindless descriptor* to the `AsBindGroup` trait. The bindless
descriptor allows the material bind group allocator to probe the layout
of the material, now that an array of `BindGroupLayoutEntry` records is
insufficient to describe the group. The `#[derive(AsBindGroup)]` has
been heavily modified to support the new features. The most important
user-facing change to that macro is that the struct-level `uniform`
attribute, `#[uniform(BINDING_NUMBER, StandardMaterial)]`, now reads
`#[uniform(BINDLESS_INDEX, MATERIAL_UNIFORM_TYPE,
binding_array(BINDING_NUMBER)]`, allowing the material to specify the
binding number for the binding array that holds the uniform data.
To make this patch simpler, I removed support for bindless
`ExtendedMaterial`s, as well as field-level bindless uniform and storage
buffers. I intend to add back support for these as a follow-up. Because
they aren't in any released Bevy version yet, I figured this was OK.
Finally, this patch updates `StandardMaterial` for the new bindless
changes. Generally, code throughout the PBR shaders that looked like
`base_color_texture[slot]` now looks like
`bindless_2d_textures[material_indices[slot].base_color_texture]`.
This patch fixes a system hang that I experienced on the [Caldera test]
when running with `caldera --random-materials --texture-count 100`. The
time per frame is around 19.75 ms, down from 154.2 ms in Bevy 0.14: a
7.8× speedup.
[Caldera test]: https://github.com/DGriffin91/bevy_caldera_scene
Deferred rendering currently doesn't support occlusion culling. This PR
implements it in a straightforward way, mirroring what we already do for
the non-deferred pipeline.
On the rend3 sci-fi base test scene, this resulted in roughly a 2×
speedup when applied on top of my other patches. For that scene, it was
useful to add another option, `--add-light`, which forces the addition
of a shadow-casting light, to the scene viewer, which I included in this
patch.
This commit restructures the multidrawable batch set builder for better
performance in various ways:
* The bin traversal is optimized to make the best use of the CPU cache.
* The inner loop that iterates over the bins, which is the hottest part
of `batch_and_prepare_binned_render_phase`, has been shrunk as small as
possible.
* Where possible, multiple elements are added to or reserved from GPU
buffers as a batch instead of one at a time.
* Methods that LLVM wasn't inlining have been marked `#[inline]` where
doing so would unlock optimizations.
This code has also been refactored to avoid duplication between the
logic for indexed and non-indexed meshes via the introduction of a
`MultidrawableBatchSetPreparer` object.
Together, this improved the `batch_and_prepare_binned_render_phase` time
on Caldera by approximately 2×.
Eventually, we should optimize the batchable-but-not-multidrawable and
unbatchable logic as well, but these meshes are much rarer, so in the
interests of keeping this patch relatively small I opted to leave those
to a follow-up.
# Objective
Fix incorrect mesh culling where objects (particularly directional
shadows) were being incorrectly culled during the early preprocessing
phase. The issue manifested specifically on Apple M1 GPUs but not on
newer devices like the M4. The bug was in the
`view_frustum_intersects_obb` function, where including the w component
(plane distance) in the dot product calculations led to false positive
culling results. This caused objects to be incorrectly culled before
shadow casting could begin.
## Issue Details
The problem of missing shadows is reproducible on Apple M1 GPUs as of
this commit (bisected):
```
00722b8d0 Make indirect drawing opt-out instead of opt-in, enabling multidraw by default. (#16757)
```
and as recent as this commit:
```
c818c9214 Add option to animate materials in many_cubes (#17927)
```
- The frustum culling calculation incorrectly included the w component
(plane distance) when transforming basis vectors
- The relative radius calculation should only consider directional
transformation (xyz), not positional information (w)
- This caused false positive culling specifically on M1 devices likely
due to different device-specific floating-point behavior
- When objects were incorrectly culled, `early_instance_count` never
incremented, leading to missing geometry in the shadow pass
## Testing
- Tested on M1 and M4 devices to verify the fix
- Verified shadows and geometry render correctly on both platforms
- Confirmed the solution matches the existing Rust implementation's
behavior for calculating the relative radius:
c818c92143/crates/bevy_render/src/primitives/mod.rs (L77-L87)
- The fix resolves a mathematical error in the frustum culling
calculation while maintaining correct culling behavior across all
platforms.
---
## Showcase
`c818c9214`
<img width="1284" alt="c818c9214"
src="https://github.com/user-attachments/assets/fe1c7ea9-b13d-422e-b12d-f1cd74475213"
/>
`mate-h/frustum-cull-fix`
<img width="1283" alt="frustum-cull-fix"
src="https://github.com/user-attachments/assets/8a9ccb2a-64b6-4d5e-a17d-ac4798da5b51"
/>
# Objective
Make checked vs unchecked shaders configurable
Fixes#17786
## Solution
Added `ValidateShaders` enum to `Shader` and added
`create_and_validate_shader_module` to `RenderDevice`
## Testing
I tested the shader examples locally and they all worked. I'd like to
write a few tests to verify but am unsure how to start.
---------
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
This adds an option to animate the materials in the `many_cubes` stress
test. Each material instance `base_color` is varied each frame.
This has been tested in conjunction with the
`--vary-material-data-per-instance` and `--material-texture-count`
options.
If `--vary-material-data-per-instance` is not used it will just update
the single material, otherwise it will update all of them. If
`--material-texture-count` is used the `base_color` is multiplied with
the texture so the effect is still visible.
Because this test is focused on the performance of updating material
data and not the performance of bevy's color system it uses its own
function (`fast_hue_to_rgb`) to quickly set the hue. This appeared to be
around 8x faster than using `base_color.set_hue(hue)` in the tight loop.
The `check_visibility` system currently follows this algorithm:
1. Store all meshes that were visible last frame in the
`PreviousVisibleMeshes` set.
2. Determine which meshes are visible. For each such visible mesh,
remove it from `PreviousVisibleMeshes`.
3. Mark all meshes that remain in `PreviousVisibleMeshes` as invisible.
This algorithm would be correct if the `check_visibility` were the only
system that marked meshes visible. However, it's not: the shadow-related
systems `check_dir_light_mesh_visibility` and
`check_point_light_mesh_visibility` can as well. This results in the
following sequence of events for meshes that are in a shadow map but
*not* visible from a camera:
A. `check_visibility` runs, finds that no camera contains these meshes,
and marks them hidden, which sets the changed flag.
B. `check_dir_light_mesh_visibility` and/or
`check_point_light_mesh_visibility` run, discover that these meshes
are visible in the shadow map, and marks them as visible, again
setting the `ViewVisibility` changed flag.
C. During the extraction phase, the mesh extraction system sees that
`ViewVisibility` is changed and re-extracts the mesh.
This is inefficient and results in needless work during rendering.
This patch fixes the issue in two ways:
* The `check_dir_light_mesh_visibility` and
`check_point_light_mesh_visibility` systems now remove meshes that they
discover from `PreviousVisibleMeshes`.
* Step (3) above has been moved from `check_visibility` to a separate
system, `mark_newly_hidden_entities_invisible`. This system runs after
all visibility-determining systems, ensuring that
`PreviousVisibleMeshes` contains only those meshes that truly became
invisible on this frame.
This fix dramatically improves the performance of [the Caldera
benchmark], when combined with several other patches I've submitted.
[the Caldera benchmark]:
https://github.com/DGriffin91/bevy_caldera_scene
PR #17688 broke motion vector computation, and therefore motion blur,
because it enabled retention of `MeshInputUniform`s, and
`MeshInputUniform`s contain the indices of the previous frame's
transform and the previous frame's skinned mesh joint matrices. On frame
N, if a `MeshInputUniform` is retained on GPU from the previous frame,
the `previous_input_index` and `previous_skin_index` would refer to the
indices for frame N - 2, not the index for frame N - 1.
This patch fixes the problems. It solves these issues in two different
ways, one for transforms and one for skins:
1. To fix transforms, this patch supplies the *frame index* to the
shader as part of the view uniforms, and specifies which frame index
each mesh's previous transform refers to. So, in the situation described
above, the frame index would be N, the previous frame index would be N -
1, and the `previous_input_frame_number` would be N - 2. The shader can
now detect this situation and infer that the mesh has been retained, and
can therefore conclude that the mesh's transform hasn't changed.
2. To fix skins, this patch replaces the explicit `previous_skin_index`
with an invariant that the index of the joints for the current frame and
the index of the joints for the previous frame are the same. This means
that the `MeshInputUniform` never has to be updated even if the skin is
animated. The downside is that we have to copy joint matrices from the
previous frame's buffer to the current frame's buffer in
`extract_skins`.
The rationale behind (2) is that we currently have no mechanism to
detect when joints that affect a skin have been updated, short of
comparing all the transforms and setting a flag for
`extract_meshes_for_gpu_building` to consume, which would regress
performance as we want `extract_skins` and
`extract_meshes_for_gpu_building` to be able to run in parallel.
To test this change, use `cargo run --example motion_blur`.
Currently, the specialized pipeline cache maps a (view entity, mesh
entity) tuple to the retained pipeline for that entity. This causes two
problems:
1. Using the view entity is incorrect, because the view entity isn't
stable from frame to frame.
2. Switching the view entity to a `RetainedViewEntity`, which is
necessary for correctness, significantly regresses performance of
`specialize_material_meshes` and `specialize_shadows` because of the
loss of the fast `EntityHash`.
This patch fixes both problems by switching to a *two-level* hash table.
The outer level of the table maps each `RetainedViewEntity` to an inner
table, which maps each `MainEntity` to its pipeline ID and change tick.
Because we loop over views first and, within that loop, loop over
entities visible from that view, we hoist the slow lookup of the view
entity out of the inner entity loop.
Additionally, this patch fixes a bug whereby pipeline IDs were leaked
when removing the view. We still have a problem with leaking pipeline
IDs for deleted entities, but that won't be fixed until the specialized
pipeline cache is retained.
This patch improves performance of the [Caldera benchmark] from 7.8×
faster than 0.14 to 9.0× faster than 0.14, when applied on top of the
global binding arrays PR, #17898.
[Caldera benchmark]: https://github.com/DGriffin91/bevy_caldera_scene
Currently, Bevy rebuilds the buffer containing all the transforms for
joints every frame, during the extraction phase. This is inefficient in
cases in which many skins are present in the scene and their joints
don't move, such as the Caldera test scene.
To address this problem, this commit switches skin extraction to use a
set of retained GPU buffers with allocations managed by the offset
allocator. I use fine-grained change detection in order to determine
which skins need updating. Note that the granularity is on the level of
an entire skin, not individual joints. Using the change detection at
that level would yield poor performance in common cases in which an
entire skin is animated at once. Also, this patch yields additional
performance from the fact that changing joint transforms no longer
requires the skinned mesh to be re-extracted.
Note that this optimization can be a double-edged sword. In
`many_foxes`, fine-grained change detection regressed the performance of
`extract_skins` by 3.4x. This is because every joint is updated every
frame in that example, so change detection is pointless and is pure
overhead. Because the `many_foxes` workload is actually representative
of animated scenes, this patch includes a heuristic that disables
fine-grained change detection if the number of transformed entities in
the frame exceeds a certain fraction of the total number of joints.
Currently, this threshold is set to 25%. Note that this is a crude
heuristic, because it doesn't distinguish between the number of
transformed *joints* and the number of transformed *entities*; however,
it should be good enough to yield the optimum code path most of the
time.
Finally, this patch fixes a bug whereby skinned meshes are actually
being incorrectly retained if the buffer offsets of the joints of those
skinned meshes changes from frame to frame. To fix this without
retaining skins, we would have to re-extract every skinned mesh every
frame. Doing this was a significant regression on Caldera. With this PR,
by contrast, mesh joints stay at the same buffer offset, so we don't
have to update the `MeshInputUniform` containing the buffer offset every
frame. This also makes PR #17717 easier to implement, because that PR
uses the buffer offset from the previous frame, and the logic for
calculating that is simplified if the previous frame's buffer offset is
guaranteed to be identical to that of the current frame.
On Caldera, this patch reduces the time spent in `extract_skins` from
1.79 ms to near zero. On `many_foxes`, this patch regresses the
performance of `extract_skins` by approximately 10%-25%, depending on
the number of foxes. This has only a small impact on frame rate.
The GPU can fill out many of the fields in `IndirectParametersMetadata`
using information it already has:
* `early_instance_count` and `late_instance_count` are always
initialized to zero.
* `mesh_index` is already present in the work item buffer as the
`input_index` of the first work item in each batch.
This patch moves these fields to a separate buffer, the *GPU indirect
parameters metadata* buffer. That way, it avoids having to write them on
CPU during `batch_and_prepare_binned_render_phase`. This effectively
reduces the number of bits that that function must write per mesh from
160 to 64 (in addition to the 64 bits per mesh *instance*).
Additionally, this PR refactors `UntypedPhaseIndirectParametersBuffers`
to add another layer, `MeshClassIndirectParametersBuffers`, which allows
abstracting over the buffers corresponding indexed and non-indexed
meshes. This patch doesn't make much use of this abstraction, but
forthcoming patches will, and it's overall a cleaner approach.
This didn't seem to have much of an effect by itself on
`batch_and_prepare_binned_render_phase` time, but subsequent PRs
dependent on this PR yield roughly a 2× speedup.
# Objective
Alternative to #17894 that also cleans up the workaround from the
previous version
## Solution
Bump version and remove entry from `typos` config
# Objective
- #17787 removed sweeping of binned render phases from 2D by accident
due to them not using the `BinnedRenderPhasePlugin`.
- Fixes#17885
## Solution
- Schedule `sweep_old_entities` in `QueueSweep` like
`BinnedRenderPhasePlugin` does, but for 2D where that plugin is not
used.
## Testing
Tested with the modified `shader_defs` example in #17885 .
Fixes#17290.
<details>
<summary>Compilation errors before fix</summary>
`cargo clippy --tests --all-features --package bevy_image`:
```rust
error[E0061]: this function takes 7 arguments but 6 arguments were supplied
--> crates/bevy_core_pipeline/src/tonemapping/mod.rs:451:5
|
451 | Image::from_buffer(
| ^^^^^^^^^^^^^^^^^^
...
454 | bytes,
| ----- argument #1 of type `std::string::String` is missing
|
note: associated function defined here
--> /Users/josiahnelson/Desktop/Programming/Rust/bevy/crates/bevy_image/src/image.rs:930:12
|
930 | pub fn from_buffer(
| ^^^^^^^^^^^
help: provide the argument
|
451 | Image::from_buffer(/* std::string::String */, bytes, image_type, CompressedImageFormats::NONE, false, image_sampler, RenderAssetUsages::RENDER_WORLD)
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
`cargo clippy --tests --all-features --package bevy_gltf`:
```rust
error[E0560]: struct `bevy_pbr::StandardMaterial` has no field named `specular_channel`
--> crates/bevy_gltf/src/loader.rs:1343:13
|
1343 | specular_channel: specular.specular_channel,
| ^^^^^^^^^^^^^^^^ `bevy_pbr::StandardMaterial` does not have this field
|
= note: available fields are: `emissive_exposure_weight`, `diffuse_transmission`, `diffuse_transmission_channel`, `diffuse_transmission_texture`, `flip_normal_map_y` ... and 9 others
error[E0560]: struct `bevy_pbr::StandardMaterial` has no field named `specular_texture`
--> crates/bevy_gltf/src/loader.rs:1345:13
|
1345 | specular_texture: specular.specular_texture,
| ^^^^^^^^^^^^^^^^ `bevy_pbr::StandardMaterial` does not have this field
|
= note: available fields are: `emissive_exposure_weight`, `diffuse_transmission`, `diffuse_transmission_channel`, `diffuse_transmission_texture`, `flip_normal_map_y` ... and 9 others
error[E0560]: struct `bevy_pbr::StandardMaterial` has no field named `specular_tint_channel`
--> crates/bevy_gltf/src/loader.rs:1351:13
|
1351 | specular_tint_channel: specular.specular_color_channel,
| ^^^^^^^^^^^^^^^^^^^^^ `bevy_pbr::StandardMaterial` does not have this field
|
= note: available fields are: `emissive_exposure_weight`, `diffuse_transmission`, `diffuse_transmission_channel`, `diffuse_transmission_texture`, `flip_normal_map_y` ... and 9 others
error[E0560]: struct `bevy_pbr::StandardMaterial` has no field named `specular_tint_texture`
--> crates/bevy_gltf/src/loader.rs:1353:13
|
1353 | specular_tint_texture: specular.specular_color_texture,
| ^^^^^^^^^^^^^^^^^^^^^ `bevy_pbr::StandardMaterial` does not have this field
|
= note: available fields are: `emissive_exposure_weight`, `diffuse_transmission`, `diffuse_transmission_channel`, `diffuse_transmission_texture`, `flip_normal_map_y` ... and 9 others
```
</details>
# Objective
Fixes#17022
## Solution
Only enable `bevy_gltf/dds` if `bevy_gltf` is already enabled.
## Testing
Tested with empty project
```toml
[dependencies]
bevy = { version = "0.16.0-dev", path = "../bevy", default-features = false, features = [
"dds",
] }
```
### Before
```
 cargo tree --depth 1 -i bevy_gltf
bevy_gltf v0.16.0-dev (/Users/robparrett/src/bevy/crates/bevy_gltf)
└── bevy_internal v0.16.0-dev (/Users/robparrett/src/bevy/crates/bevy_internal)
```
### After
```
 cargo tree --depth 1 -i bevy_gltf
warning: nothing to print.
To find dependencies that require specific target platforms, try to use option `--target all` first, and then narrow your search scope accordingly.
```
# Context
Renaming `Parent` to `ChildOf` in #17247 has been contentious. While
those users concerns are valid (especially around legibility of code
IMO!), @cart [has
decided](https://discord.com/channels/691052431525675048/749335865876021248/1340434322833932430)
to stick with the new name.
> In general this conversation is unsurprising to me, as it played out
essentially the same way when I asked for opinions in my PR. There are
strong opinions on both sides. Everyone is right in their own way.
>
> I chose ChildOf for the following reasons:
>
> 1. I think it derives naturally from the system we have built, the
concepts we have chosen, and how we generally name the types that
implement a trait in Rust. This is the name of the type implementing
Relationship. We are adding that Relationship component to a given
entity (whether it "is" the relationship or "has" the relationship is
kind of immaterial ... we are naming the relationship that it "is" or
"has"). What is the name of the relationship that a child has to its
parent? It is a "child" of the parent of course!
> 2. In general the non-parent/child relationships I've seen in the wild
generally benefit from (or need to) use the naming convention in (1)
(aka calling the Relationship the name of the relationship the entity
has). Many relationships don't have an equivalent to the Parent/Child
name concept.
> 3. I do think we could get away with using (1) for pretty much
everything else and special casing Parent/Children. But by embracing the
naming convention, we help establish that this is in fact a pattern, and
we help prime people to think about these things in a consistent way.
Consistency and predictability is a generally desirable property. And
for something as divisive and polarizing as relationship naming, I think
drawing a hard line in the sand is to the benefit of the community as a
whole.
> 4. I believe the fact that we dont see as much of the XOf naming style
elsewhere is to our benefit. When people see things in that style, they
are primed to think of them as relationships (after some exposure to
Bevy and the ecosystem). I consider this a useful hint.
> 5. Most of the practical confusion from using ChildOf seems to be from
calling the value of the target field we read from the relationship
child_of. The name of the target field should be parent (we could even
consider renaming child_of.0 to child_of.parent for clarity). I suspect
that existing Bevy users renaming their existing code will feel the most
friction here, as this requires a reframing. Imo it is natural and
expected to receive pushback from these users hitting this case.
## Objective
The new documentation doesn't do a particularly good job at quickly
explaining the meaning of each component or how to work with them;
making a tricky migration more painful and slowing down new users as
they learn about some of the most fundamental types in Bevy.
## Solution
1. Clearly explain what each component does in the very first line,
assuming no background knowledge. This is the first relationships that
99% of users will encounter, so explaining that they are relationships
is unhelpful as an introduction.
2. Add doc aliases for the rejected `IsParent`/`IsChild`/`Parent` names,
to improve autocomplete and doc searching.
3. Do some assorted docs cleanup while we're here.
---------
Co-authored-by: Eagster <79881080+ElliottjPierce@users.noreply.github.com>
## Objective
There's no general error for when an entity doesn't exist, and some
methods are going to need one when they get Resultified. The closest
thing is `EntityFetchError`, but that error has a slightly more specific
purpose.
## Solution
- Added `EntityDoesNotExistError`.
- Contains `Entity` and `EntityDoesNotExistDetails`.
- Changed `EntityFetchError` and `QueryEntityError`:
- Changed `NoSuchEntity` variant to wrap `EntityDoesNotExistError` and
renamed the variant to `EntityDoesNotExist`.
- Renamed `EntityFetchError` to `EntityMutableFetchError` to make its
purpose clearer.
- Renamed `TryDespawnError` to `EntityDespawnError` to make it more
general.
- Changed `World::inspect_entity` to return `Result<[ok],
EntityDoesNotExistError>` instead of panicking.
- Changed `World::get_entity` and `WorldEntityFetch::fetch_ref` to
return `Result<[ok], EntityDoesNotExistError>` instead of `Result<[ok],
Entity>`.
- Changed `UnsafeWorldCell::get_entity` to return
`Result<UnsafeEntityCell, EntityDoesNotExistError>` instead of
`Option<UnsafeEntityCell>`.
## Migration Guide
- `World::inspect_entity` now returns `Result<impl Iterator<Item =
&ComponentInfo>, EntityDoesNotExistError>` instead of `impl
Iterator<Item = &ComponentInfo>`.
- `World::get_entity` now returns `EntityDoesNotExistError` as an error
instead of `Entity`. You can still access the entity's ID through the
error's `entity` field.
- `UnsafeWorldCell::get_entity` now returns `Result<UnsafeEntityCell,
EntityDoesNotExistError>` instead of `Option<UnsafeEntityCell>`.
# Objective
Fix panic in `custom_render_phase`.
This example was broken by #17764, but that breakage evolved into a
panic after #17849. This new panic seems to illustrate the problem in a
pretty straightforward way.
```
2025-02-15T00:44:11.833622Z INFO bevy_diagnostic::system_information_diagnostics_plugin::internal: SystemInfo { os: "macOS 15.3 Sequoia", kernel: "24.3.0", cpu: "Apple M4 Max", core_count: "16", memory: "64.0 GiB" }
2025-02-15T00:44:11.908328Z INFO bevy_render::renderer: AdapterInfo { name: "Apple M4 Max", vendor: 0, device: 0, device_type: IntegratedGpu, driver: "", driver_info: "", backend: Metal }
2025-02-15T00:44:12.314930Z INFO bevy_winit::system: Creating new window App (0v1)
thread 'Compute Task Pool (1)' panicked at /Users/me/src/bevy/crates/bevy_ecs/src/system/function_system.rs:216:28:
bevy_render::batching::gpu_preprocessing::batch_and_prepare_sorted_render_phase<custom_render_phase::Stencil3d, custom_render_phase::StencilPipeline> could not access system parameter ResMut<PhaseBatchedInstanceBuffers<Stencil3d, MeshUniform>>
```
## Solution
Add a `SortedRenderPhasePlugin` for the custom phase.
## Testing
`cargo run --example custom_render_phase`
Appending to these vectors is performance-critical in
`batch_and_prepare_binned_render_phase`, so `RawBufferVec`, which
doesn't have the overhead of `encase`, is more appropriate.
The `collect_buffers_for_phase` system tries to reuse these buffers, but
its efforts are stymied by the fact that
`clear_batched_gpu_instance_buffers` clears the containing hash table
and therefore frees the buffers. This patch makes
`clear_batched_gpu_instance_buffers` stop doing that so that the
allocations can be reused.
# Objective
Simplify the API surface by removing duplicated functionality between
`Query` and `QueryState`.
Reduce the amount of `unsafe` code required in `QueryState`.
This is a follow-up to #15858.
## Solution
Move implementations of `Query` methods from `QueryState` to `Query`.
Instead of the original methods being on `QueryState`, with `Query`
methods calling them by passing the individual parameters, the original
methods are now on `Query`, with `QueryState` methods calling them by
constructing a `Query`.
This also adds two `_inner` methods that were missed in #15858:
`iter_many_unique_inner` and `single_inner`.
One goal here is to be able to deprecate and eventually remove many of
the methods on `QueryState`, reducing the overall API surface. (I
expected to do that in this PR, but this change was large enough on its
own!) Now that the `QueryState` methods each consist of a simple
expression like `self.query(world).get_inner(entity)`, a future PR can
deprecate some or all of them with simple migration instructions.
The other goal is to reduce the amount of `unsafe` code. The current
implementation of a read-only method like `QueryState::get` directly
calls the `unsafe fn get_unchecked_manual` and needs to repeat the proof
that `&World` has enough access. With this change, `QueryState::get` is
entirely safe code, with the proof that `&World` has enough access done
by the `query()` method and shared across all read-only operations.
## Future Work
The next step will be to mark the `QueryState` methods as
`#[deprecated]` and migrate callers to the methods on `Query`.
# Objective
Support accessing resources using reflection when using
`FilteredResources` in a dynamic system. This is similar to how
components can be queried using reflection when using
`FilteredEntityRef|Mut`.
## Solution
Change `ReflectResource` from taking `&World` and `&mut World` to taking
`impl Into<FilteredResources>` and `impl Into<FilteredResourcesMut>`,
similar to how `ReflectComponent` takes `impl Into<FilteredEntityRef>`
and `impl Into<FilteredEntityMut>`. There are `From` impls that ensure
code passing `&World` and `&mut World` continues to work as before.
## Migration Guide
If you are manually creating a `ReflectComponentFns` struct, the
`reflect` function now takes `FilteredResources` instead `&World`, and
there is a new `reflect_mut` function that takes `FilteredResourcesMut`.
# Objective
Add reference to reported position space in picking backend docs.
Fixes#17844
## Solution
Add explanatory docs to the implementation notes of each picking
backend.
## Testing
`cargo r -p ci -- doc-check` & `cargo r -p ci -- lints`
# Objective
It is impossible to register a type with `TypeRegistry::register` if the
type is unnameable (in the current scope).
## Solution
Add `TypeRegistry::register_by_val` which mirrors std's `size_of_val`
and friends.
## Testing
There's a doc test (unrelated but there seem to be some pre-existing
broken doc links in `bevy_reflect`).
There was nonsense code in `batch_and_prepare_sorted_render_phase` that
created temporary buffers to add objects to instead of using the correct
ones. I think this was debug code. This commit removes that code in
favor of writing to the actual buffers.
Closes#17846.
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
# Objective
`bevy_assets` has long been unapproachable for contributors and users.
More and better documentation would help that.
We're gradually moving towards globally denying missing docs (#3492)!
However, writing all of the hundreds of missing doc strings in a single
go will be miserable to review.
## Solution
Remove the allow for missing docs temporarily, and then pick some easy
missing doc warnings largely at random to tackle.
Stop when the change set is starting to feel intimidating.
# Objective
Fixes#17851
## Solution
Align the `slider` uniform to 16 bytes by making it a `vec4`.
## Testing
Run the example using:
```
cargo run -p build-wasm-example -- --api webgl2 ui_material
basic-http-server examples/wasm/
```
The `output_index` field is only used in direct mode, and the
`indirect_parameters_index` field is only used in indirect mode.
Consequently, we can combine them into a single field, reducing the size
of `PreprocessWorkItem`, which
`batch_and_prepare_{binned,sorted}_render_phase` must construct every
frame for every mesh instance, from 96 bits to 64 bits.
# Objective
Continuation of #16547.
We do not yet have parallel versions of `par_iter_many` and
`par_iter_many_unique`. It is currently very painful to try and use
parallel iteration over entity lists. Even if a list is not long, each
operation might still be very expensive, and worth parallelizing.
Plus, it has been requested several times!
## Solution
Once again, we implement what we lack!
These parallel iterators collect their input entity list into a
`Vec`/`UniqueEntityVec`, then chunk that over the available threads,
inspired by the original `par_iter`.
Since no order guarantee is given to the caller, we could sort the input
list according to `EntityLocation`, but that would likely only be worth
it for very large entity lists.
There is some duplication which could likely be improved, but I'd like
to leave that for a follow-up.
## Testing
The doc tests on `for_each_init` of `QueryParManyIter` and
`QueryParManyUniqueIter`.
# Objective
While surveying the state of documentation for bevy_assets, I noticed a
few minor issues.
## Solution
Revise the docs to focus on clear explanations of core ideas and
cross-linking related objects.
# Objective
Update typos, fix new typos.
1.29.6 was just released to fix an
[issue](https://github.com/crate-ci/typos/issues/1228) where January's
corrections were not included in the binaries for the last release.
Reminder: typos can be tossed in the monthly [non-critical corrections
issue](https://github.com/crate-ci/typos/issues/1221).
## Solution
I chose to allow `implementors`, because a good argument seems to be
being made [here](https://github.com/crate-ci/typos/issues/1226) and
there is now a PR to address that.
## Discussion
Should I exclude `bevy_mikktspace`?
At one point I think we had an informal policy of "don't mess with
mikktspace until https://github.com/bevyengine/bevy/pull/9050 is merged"
but it doesn't seem like that is likely to be merged any time soon.
I think these particular corrections in mikktspace are fine because
- The same typo mistake seems to have been fixed in that PR
- The entire file containing these corrections was deleted in that PR
## Typo of the Month
correspindong -> corresponding
# Objective
Updates the now inaccurate position docs
Fixes#17832
## Solution
From
`The position of the intersection in the world, if the data is available
from the backend.`
To
`The position reported by the backend, if the data is available.
Position data may be in any space (e.g. World space, Screen space, Local
space), specified by the backend providing it.`
## Testing
uhh reading :)
Currently, invocations of `batch_and_prepare_binned_render_phase` and
`batch_and_prepare_sorted_render_phase` can't run in parallel because
they write to scene-global GPU buffers. After PR #17698,
`batch_and_prepare_binned_render_phase` started accounting for the
lion's share of the CPU time, causing us to be strongly CPU bound on
scenes like Caldera when occlusion culling was on (because of the
overhead of batching for the Z-prepass). Although I eventually plan to
optimize `batch_and_prepare_binned_render_phase`, we can obtain
significant wins now by parallelizing that system across phases.
This commit splits all GPU buffers that
`batch_and_prepare_binned_render_phase` and
`batch_and_prepare_sorted_render_phase` touches into separate buffers
for each phase so that the scheduler will run those phases in parallel.
At the end of batch preparation, we gather the render phases up into a
single resource with a new *collection* phase. Because we already run
mesh preprocessing separately for each phase in order to make occlusion
culling work, this is actually a cleaner separation. For example, mesh
output indices (the unique ID that identifies each mesh instance on GPU)
are now guaranteed to be sequential starting from 0, which will simplify
the forthcoming work to remove them in favor of the compute dispatch ID.
On Caldera, this brings the frame time down to approximately 9.1 ms with
occlusion culling on.

# Objective
Fix unsoundness introduced by #15858. `QueryLens::query()` would hand
out a `Query` with the full `'w` lifetime, and the new `_inner` methods
would let the results outlive the `Query`. This could be used to create
aliasing mutable references, like
```rust
fn bad<'w>(mut lens: QueryLens<'w, EntityMut>, entity: Entity) {
let one: EntityMut<'w> = lens.query().get_inner(entity).unwrap();
let two: EntityMut<'w> = lens.query().get_inner(entity).unwrap();
assert!(one.entity() == two.entity());
}
```
Fixes#17693
## Solution
Restrict the `'world` lifetime in the `Query` returned by
`QueryLens::query()` to `'_`, the lifetime of the borrow of the
`QueryLens`.
The model here is that `Query<'w, 's, D, F>` and `QueryLens<'w, D, F>`
have permission to access their components for the lifetime `'w`. So
going from `&'a mut QueryLens<'w>` to `Query<'w, 'a>` would borrow the
permission only for the `'a` lifetime, but incorrectly give it out for
the full `'w` lifetime.
To handle any cases where users were calling `get_inner()` or
`iter_inner()` on the `Query` and expecting the full `'w` lifetime, we
introduce a new `QueryLens::query_inner()` method. This is only valid
for `ReadOnlyQueryData`, so it may safely hand out a copy of the
permission for the full `'w` lifetime. Since `get_inner()` and
`iter_inner()` were only valid on `ReadOnlyQueryData` prior to #15858,
that should cover any uses that relied on the longer lifetime.
## Migration Guide
Users of `QueryLens::query()` who were calling `get_inner()` or
`iter_inner()` will need to replace the call with
`QueryLens::query_inner()`.
Conceptually, bins are ordered hash maps. We currently implement these
as a list of keys with an associated hash map. But we already have a
data type that implements ordered hash maps directly: `IndexMap`. This
patch switches Bevy to use `IndexMap`s for bins. Because we're memory
bound, this doesn't affect performance much, but it is cleaner.
# Objective
Related to #17784. The ticket is actually about just getting rid of
`Entity{Ref,Mut}Except` in favor of `FilteredEntity{Ref,Mut}`, but I got
told the unification of Entity types is a bigger endeavor that has been
going on for a while now (as the "Pointing Fingers" working group) and I
should just add the functions I actually need in the meantime.
## Solution
This PR adds all of the functions necessary to access components by
TypeId or ComponentId instead of static types.
## Testing
> Did you test these changes? If so, how?
Haven't tested it yet, but the changes are mostly copy/paste from other
implementations in the same file, since there is a lot of duplicated
functionality there.
## Not a Migration Guide
There shouldn't be any breaking changes, it's just a few new functions
on existing types.
I had to shuffle around the lifetimes in `From<&EntityMutExcept<'a, B>>
for EntityRefExcept<'a, B>` (originally it was `From<&'a
EntityMutExcept<'_, B>> for EntityRefExcept<'_, B>`) to make the borrow
checker happy, but I don't think that this should have an impact on user
code (correct me if I'm wrong).
* Use texture atomics rather than buffer atomics for the visbuffer
(haven't tested perf on a raster-heavy scene yet)
* Unfortunately to clear the visbuffer we now need a compute pass to
clear it. Using wgpu's clear_texture function internally uses a buffer
-> image copy that's insanely expensive. Ideally it should be using
vkCmdClearColorImage, which I've opened an issue for
https://github.com/gfx-rs/wgpu/issues/7090. For now we'll have to stick
with a custom compute pass and all the extra code that brings.
* Faster resolve depth pass by discarding 0 depth pixels instead of
redundantly writing zero (2x faster for big depth textures like shadow
views)
# Objective
Tidy up a few little things I noticed while working with this example
## Solution
- Fix manual resetting of a repeating timer
- Use atlas image size instead of hardcoded value. Atlases are always
512x512 right now, but hopefully not in the future.
- Pluralize a variable name for a variable holding a `Vec`
# Objective
I'm working on some PRs involving our font atlases and it would be nice
to be able to test these scenarios separately to better understand the
performance tradeoffs in different situations.
## Solution
Add a `many-font-sizes` option.
The old behavior is still available by running with `--many-glyphs
--many-font-sizes`.
## Testing
`cargo run --example many_text2d --release`
`cargo run --example many_text2d --release -- --many-glyphs`
`cargo run --example many_text2d --release -- --many-font-sizes`
`cargo run --example many_text2d --release -- --many-glyphs
--many-font-sizes`
# Objective
- Wgpu has some expensive code it injects into shaders to avoid the
possibility of things like infinite loops. Generally our shaders are
written by users who won't do this, so it just makes our shaders perform
worse.
## Solution
- Turn off the checks.
- We could try to conditionally keep them, but that complicates the code
and 99.9% of users won't want this.
## Migration Guide
- Bevy no longer turns on wgpu's runtime safety checks
https://docs.rs/wgpu/latest/wgpu/struct.ShaderRuntimeChecks.html. If you
were using Bevy with untrusted shaders, please file an issue.
---------
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
# Objective
Continuation of #17589 and #16547.
Slices have several methods that return iterators which themselves yield
slices, which we have not yet implemented.
An example use is `par_iter_many` style logic.
## Solution
Their implementation is rather straightforward, we simply delegate all
impls to `[T]`.
The resulting iterator types need their own wrappers in the form of
`UniqueEntitySliceIter` and `UniqueEntitySliceIterMut`.
We also add three free functions that cast slices of entity slices to
slices of `UniqueEntitySlice`.
These three should be sufficient, though infinite nesting is achievable
with a trait (like `TrustedEntityBorrow` works over infinite reference
nesting), should the need ever arise.
# Objective
Fixes#17810
Y'all picked the option I already implemented, yay.
## Solution
Add a system that panics if the load state of an asset is `Failed`.
## Testing
`cargo run --example scene`
- Tested with valid scene file
- Introduced a syntax error in the scene file
- Deleted the scene file