# Objective
- Progress towards #19887.
- I am avoiding dealing with the `occlusion_culling` example since it is
kinda annoying to resolve nicely (I will do so in another PR).
## Solution
- Rewrite these examples to replace FromWorld implementations with
systems and other resource changes with systems as well.
## Testing
- All the changed examples have been tested and still work.
A few versions ago, wgpu made it possible to set shader entry point to
`None`, which will select the correct entry point in file where only a
single entrypoint is specified. This makes it possible to implement
`Default` for pipeline descriptors. This PR does so and attempts to
`..default()` everything possible.
# Objective
- This unblocks some work I am doing for #19887.
## Solution
- Rename `RenderGraphApp` to `RenderGraphExt`.
- Implement `RenderGraphExt` for `World`.
- Change `SubApp` and `App` to call the `World` impl.
# Objective
Upgrade to `wgpu` version `25.0`.
Depends on https://github.com/bevyengine/naga_oil/pull/121
## Solution
### Problem
The biggest issue we face upgrading is the following requirement:
> To facilitate this change, there was an additional validation rule put
in place: if there is a binding array in a bind group, you may not use
dynamic offset buffers or uniform buffers in that bind group. This
requirement comes from vulkan rules on UpdateAfterBind descriptors.
This is a major difficulty for us, as there are a number of binding
arrays that are used in the view bind group. Note, this requirement does
not affect merely uniform buffors that use dynamic offset but the use of
*any* uniform in a bind group that also has a binding array.
### Attempted fixes
The easiest fix would be to change uniforms to be storage buffers
whenever binding arrays are in use:
```wgsl
#ifdef BINDING_ARRAYS_ARE_USED
@group(0) @binding(0) var<uniform> view: View;
@group(0) @binding(1) var<uniform> lights: types::Lights;
#else
@group(0) @binding(0) var<storage> view: array<View>;
@group(0) @binding(1) var<storage> lights: array<types::Lights>;
#endif
```
This requires passing the view index to the shader so that we know where
to index into the buffer:
```wgsl
struct PushConstants {
view_index: u32,
}
var<push_constant> push_constants: PushConstants;
```
Using push constants is no problem because binding arrays are only
usable on native anyway.
However, this greatly complicates the ability to access `view` in
shaders. For example:
```wgsl
#ifdef BINDING_ARRAYS_ARE_USED
mesh_view_bindings::view.view_from_world[0].z
#else
mesh_view_bindings::view[mesh_view_bindings::view_index].view_from_world[0].z
#endif
```
Using this approach would work but would have the effect of polluting
our shaders with ifdef spam basically *everywhere*.
Why not use a function? Unfortunately, the following is not valid wgsl
as it returns a binding directly from a function in the uniform path.
```wgsl
fn get_view() -> View {
#if BINDING_ARRAYS_ARE_USED
let view_index = push_constants.view_index;
let view = views[view_index];
#endif
return view;
}
```
This also poses problems for things like lights where we want to return
a ptr to the light data. Returning ptrs from wgsl functions isn't
allowed even if both bindings were buffers.
The next attempt was to simply use indexed buffers everywhere, in both
the binding array and non binding array path. This would be viable if
push constants were available everywhere to pass the view index, but
unfortunately they are not available on webgpu. This means either
passing the view index in a storage buffer (not ideal for such a small
amount of state) or using push constants sometimes and uniform buffers
only on webgpu. However, this kind of conditional layout infects
absolutely everything.
Even if we were to accept just using storage buffer for the view index,
there's also the additional problem that some dynamic offsets aren't
actually per-view but per-use of a setting on a camera, which would
require passing that uniform data on *every* camera regardless of
whether that rendering feature is being used, which is also gross.
As such, although it's gross, the simplest solution just to bump binding
arrays into `@group(1)` and all other bindings up one bind group. This
should still bring us under the device limit of 4 for most users.
### Next steps / looking towards the future
I'd like to avoid needing split our view bind group into multiple parts.
In the future, if `wgpu` were to add `@builtin(draw_index)`, we could
build a list of draw state in gpu processing and avoid the need for any
kind of state change at all (see
https://github.com/gfx-rs/wgpu/issues/6823). This would also provide
significantly more flexibility to handle things like offsets into other
arrays that may not be per-view.
### Testing
Tested a number of examples, there are probably more that are still
broken.
---------
Co-authored-by: François Mockers <mockersf@gmail.com>
Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com>
# Objective
Fix https://github.com/bevyengine/bevy/issues/19617
# Solution
Add newlines before all impl blocks.
I suspect that at least some of these will be objectionable! If there's
a desired Bevy style for this then I'll update the PR. If not then we
can just close it - it's the work of a single find and replace.
# Objective
Improve the performance of `FilteredEntity(Ref|Mut)` and
`Entity(Ref|Mut)Except`.
`FilteredEntityRef` needs an `Access<ComponentId>` to determine what
components it can access. There is one stored in the query state, but
query items cannot borrow from the state, so it has to `clone()` the
access for each row. Cloning the access involves memory allocations and
can be expensive.
## Solution
Let query items borrow from their query state.
Add an `'s` lifetime to `WorldQuery::Item` and `WorldQuery::Fetch`,
similar to the one in `SystemParam`, and provide `&'s Self::State` to
the fetch so that it can borrow from the state.
Unfortunately, there are a few cases where we currently return query
items from temporary query states: the sorted iteration methods create a
temporary state to query the sort keys, and the
`EntityRef::components<Q>()` methods create a temporary state for their
query.
To allow these to continue to work with most `QueryData`
implementations, introduce a new subtrait `ReleaseStateQueryData` that
converts a `QueryItem<'w, 's>` to `QueryItem<'w, 'static>`, and is
implemented for everything except `FilteredEntity(Ref|Mut)` and
`Entity(Ref|Mut)Except`.
`#[derive(QueryData)]` will generate `ReleaseStateQueryData`
implementations that apply when all of the subqueries implement
`ReleaseStateQueryData`.
This PR does not actually change the implementation of
`FilteredEntity(Ref|Mut)` or `Entity(Ref|Mut)Except`! That will be done
as a follow-up PR so that the changes are easier to review. I have
pushed the changes as chescock/bevy#5.
## Testing
I ran performance traces of many_foxes, both against main and against
chescock/bevy#5, both including #15282. These changes do appear to make
generalized animation a bit faster:
(Red is main, yellow is chescock/bevy#5)

## Migration Guide
The `WorldQuery::Item` and `WorldQuery::Fetch` associated types and the
`QueryItem` and `ROQueryItem` type aliases now have an additional
lifetime parameter corresponding to the `'s` lifetime in `Query`. Manual
implementations of `WorldQuery` will need to update the method
signatures to include the new lifetimes. Other uses of the types will
need to be updated to include a lifetime parameter, although it can
usually be passed as `'_`. In particular, `ROQueryItem` is used when
implementing `RenderCommand`.
Before:
```rust
fn render<'w>(
item: &P,
view: ROQueryItem<'w, Self::ViewQuery>,
entity: Option<ROQueryItem<'w, Self::ItemQuery>>,
param: SystemParamItem<'w, '_, Self::Param>,
pass: &mut TrackedRenderPass<'w>,
) -> RenderCommandResult;
```
After:
```rust
fn render<'w>(
item: &P,
view: ROQueryItem<'w, '_, Self::ViewQuery>,
entity: Option<ROQueryItem<'w, '_, Self::ItemQuery>>,
param: SystemParamItem<'w, '_, Self::Param>,
pass: &mut TrackedRenderPass<'w>,
) -> RenderCommandResult;
```
---
Methods on `QueryState` that take `&mut self` may now result in
conflicting borrows if the query items capture the lifetime of the
mutable reference. This affects `get()`, `iter()`, and others. To fix
the errors, first call `QueryState::update_archetypes()`, and then
replace a call `state.foo(world, param)` with
`state.query_manual(world).foo_inner(param)`. Alternately, you may be
able to restructure the code to call `state.query(world)` once and then
make multiple calls using the `Query`.
Before:
```rust
let mut state: QueryState<_, _> = ...;
let d1 = state.get(world, e1);
let d2 = state.get(world, e2); // Error: cannot borrow `state` as mutable more than once at a time
println!("{d1:?}");
println!("{d2:?}");
```
After:
```rust
let mut state: QueryState<_, _> = ...;
state.update_archetypes(world);
let d1 = state.get_manual(world, e1);
let d2 = state.get_manual(world, e2);
// OR
state.update_archetypes(world);
let d1 = state.query(world).get_inner(e1);
let d2 = state.query(world).get_inner(e2);
// OR
let query = state.query(world);
let d1 = query.get_inner(e1);
let d1 = query.get_inner(e2);
println!("{d1:?}");
println!("{d2:?}");
```
# Objective
Fixes a part of #14274.
Bevy has an incredibly inconsistent naming convention for its system
sets, both internally and across the ecosystem.
<img alt="System sets in Bevy"
src="https://github.com/user-attachments/assets/d16e2027-793f-4ba4-9cc9-e780b14a5a1b"
width="450" />
*Names of public system set types in Bevy*
Most Bevy types use a naming of `FooSystem` or just `Foo`, but there are
also a few `FooSystems` and `FooSet` types. In ecosystem crates on the
other hand, `FooSet` is perhaps the most commonly used name in general.
Conventions being so wildly inconsistent can make it harder for users to
pick names for their own types, to search for system sets on docs.rs, or
to even discern which types *are* system sets.
To reign in the inconsistency a bit and help unify the ecosystem, it
would be good to establish a common recommended naming convention for
system sets in Bevy itself, similar to how plugins are commonly suffixed
with `Plugin` (ex: `TimePlugin`). By adopting a consistent naming
convention in first-party Bevy, we can softly nudge ecosystem crates to
follow suit (for types where it makes sense to do so).
Choosing a naming convention is also relevant now, as the [`bevy_cli`
recently adopted
lints](https://github.com/TheBevyFlock/bevy_cli/pull/345) to enforce
naming for plugins and system sets, and the recommended naming used for
system sets is still a bit open.
## Which Name To Use?
Now the contentious part: what naming convention should we actually
adopt?
This was discussed on the Bevy Discord at the end of last year, starting
[here](<https://discord.com/channels/691052431525675048/692572690833473578/1310659954683936789>).
`FooSet` and `FooSystems` were the clear favorites, with `FooSet` very
narrowly winning an unofficial poll. However, it seems to me like the
consensus was broadly moving towards `FooSystems` at the end and after
the poll, with Cart
([source](https://discord.com/channels/691052431525675048/692572690833473578/1311140204974706708))
and later Alice
([source](https://discord.com/channels/691052431525675048/692572690833473578/1311092530732859533))
and also me being in favor of it.
Let's do a quick pros and cons list! Of course these are just what I
thought of, so take it with a grain of salt.
`FooSet`:
- Pro: Nice and short!
- Pro: Used by many ecosystem crates.
- Pro: The `Set` suffix comes directly from the trait name `SystemSet`.
- Pro: Pairs nicely with existing APIs like `in_set` and
`configure_sets`.
- Con: `Set` by itself doesn't actually indicate that it's related to
systems *at all*, apart from the implemented trait. A set of what?
- Con: Is `FooSet` a set of `Foo`s or a system set related to `Foo`? Ex:
`ContactSet`, `MeshSet`, `EnemySet`...
`FooSystems`:
- Pro: Very clearly indicates that the type represents a collection of
systems. The actual core concept, system(s), is in the name.
- Pro: Parallels nicely with `FooPlugins` for plugin groups.
- Pro: Low risk of conflicts with other names or misunderstandings about
what the type is.
- Pro: In most cases, reads *very* nicely and clearly. Ex:
`PhysicsSystems` and `AnimationSystems` as opposed to `PhysicsSet` and
`AnimationSet`.
- Pro: Easy to search for on docs.rs.
- Con: Usually results in longer names.
- Con: Not yet as widely used.
Really the big problem with `FooSet` is that it doesn't actually
describe what it is. It describes what *kind of thing* it is (a set of
something), but not *what it is a set of*, unless you know the type or
check its docs or implemented traits. `FooSystems` on the other hand is
much more self-descriptive in this regard, at the cost of being a bit
longer to type.
Ultimately, in some ways it comes down to preference and how you think
of system sets. Personally, I was originally in favor of `FooSet`, but
have been increasingly on the side of `FooSystems`, especially after
seeing what the new names would actually look like in Avian and now
Bevy. I prefer it because it usually reads better, is much more clearly
related to groups of systems than `FooSet`, and overall *feels* more
correct and natural to me in the long term.
For these reasons, and because Alice and Cart also seemed to share a
preference for it when it was previously being discussed, I propose that
we adopt a `FooSystems` naming convention where applicable.
## Solution
Rename Bevy's system set types to use a consistent `FooSet` naming where
applicable.
- `AccessibilitySystem` → `AccessibilitySystems`
- `GizmoRenderSystem` → `GizmoRenderSystems`
- `PickSet` → `PickingSystems`
- `RunFixedMainLoopSystem` → `RunFixedMainLoopSystems`
- `TransformSystem` → `TransformSystems`
- `RemoteSet` → `RemoteSystems`
- `RenderSet` → `RenderSystems`
- `SpriteSystem` → `SpriteSystems`
- `StateTransitionSteps` → `StateTransitionSystems`
- `RenderUiSystem` → `RenderUiSystems`
- `UiSystem` → `UiSystems`
- `Animation` → `AnimationSystems`
- `AssetEvents` → `AssetEventSystems`
- `TrackAssets` → `AssetTrackingSystems`
- `UpdateGizmoMeshes` → `GizmoMeshSystems`
- `InputSystem` → `InputSystems`
- `InputFocusSet` → `InputFocusSystems`
- `ExtractMaterialsSet` → `MaterialExtractionSystems`
- `ExtractMeshesSet` → `MeshExtractionSystems`
- `RumbleSystem` → `RumbleSystems`
- `CameraUpdateSystem` → `CameraUpdateSystems`
- `ExtractAssetsSet` → `AssetExtractionSystems`
- `Update2dText` → `Text2dUpdateSystems`
- `TimeSystem` → `TimeSystems`
- `AudioPlaySet` → `AudioPlaybackSystems`
- `SendEvents` → `EventSenderSystems`
- `EventUpdates` → `EventUpdateSystems`
A lot of the names got slightly longer, but they are also a lot more
consistent, and in my opinion the majority of them read much better. For
a few of the names I took the liberty of rewording things a bit;
definitely open to any further naming improvements.
There are still also cases where the `FooSystems` naming doesn't really
make sense, and those I left alone. This primarily includes system sets
like `Interned<dyn SystemSet>`, `EnterSchedules<S>`, `ExitSchedules<S>`,
or `TransitionSchedules<S>`, where the type has some special purpose and
semantics.
## Todo
- [x] Should I keep all the old names as deprecated type aliases? I can
do this, but to avoid wasting work I'd prefer to first reach consensus
on whether these renames are even desired.
- [x] Migration guide
- [x] Release notes
# Objective
The goal of `bevy_platform_support` is to provide a set of platform
agnostic APIs, alongside platform-specific functionality. This is a high
traffic crate (providing things like HashMap and Instant). Especially in
light of https://github.com/bevyengine/bevy/discussions/18799, it
deserves a friendlier / shorter name.
Given that it hasn't had a full release yet, getting this change in
before Bevy 0.16 makes sense.
## Solution
- Rename `bevy_platform_support` to `bevy_platform`.
Currently, Bevy rebuilds the buffer containing all the transforms for
joints every frame, during the extraction phase. This is inefficient in
cases in which many skins are present in the scene and their joints
don't move, such as the Caldera test scene.
To address this problem, this commit switches skin extraction to use a
set of retained GPU buffers with allocations managed by the offset
allocator. I use fine-grained change detection in order to determine
which skins need updating. Note that the granularity is on the level of
an entire skin, not individual joints. Using the change detection at
that level would yield poor performance in common cases in which an
entire skin is animated at once. Also, this patch yields additional
performance from the fact that changing joint transforms no longer
requires the skinned mesh to be re-extracted.
Note that this optimization can be a double-edged sword. In
`many_foxes`, fine-grained change detection regressed the performance of
`extract_skins` by 3.4x. This is because every joint is updated every
frame in that example, so change detection is pointless and is pure
overhead. Because the `many_foxes` workload is actually representative
of animated scenes, this patch includes a heuristic that disables
fine-grained change detection if the number of transformed entities in
the frame exceeds a certain fraction of the total number of joints.
Currently, this threshold is set to 25%. Note that this is a crude
heuristic, because it doesn't distinguish between the number of
transformed *joints* and the number of transformed *entities*; however,
it should be good enough to yield the optimum code path most of the
time.
Finally, this patch fixes a bug whereby skinned meshes are actually
being incorrectly retained if the buffer offsets of the joints of those
skinned meshes changes from frame to frame. To fix this without
retaining skins, we would have to re-extract every skinned mesh every
frame. Doing this was a significant regression on Caldera. With this PR,
by contrast, mesh joints stay at the same buffer offset, so we don't
have to update the `MeshInputUniform` containing the buffer offset every
frame. This also makes PR #17717 easier to implement, because that PR
uses the buffer offset from the previous frame, and the logic for
calculating that is simplified if the previous frame's buffer offset is
guaranteed to be identical to that of the current frame.
On Caldera, this patch reduces the time spent in `extract_skins` from
1.79 ms to near zero. On `many_foxes`, this patch regresses the
performance of `extract_skins` by approximately 10%-25%, depending on
the number of foxes. This has only a small impact on frame rate.
The GPU can fill out many of the fields in `IndirectParametersMetadata`
using information it already has:
* `early_instance_count` and `late_instance_count` are always
initialized to zero.
* `mesh_index` is already present in the work item buffer as the
`input_index` of the first work item in each batch.
This patch moves these fields to a separate buffer, the *GPU indirect
parameters metadata* buffer. That way, it avoids having to write them on
CPU during `batch_and_prepare_binned_render_phase`. This effectively
reduces the number of bits that that function must write per mesh from
160 to 64 (in addition to the 64 bits per mesh *instance*).
Additionally, this PR refactors `UntypedPhaseIndirectParametersBuffers`
to add another layer, `MeshClassIndirectParametersBuffers`, which allows
abstracting over the buffers corresponding indexed and non-indexed
meshes. This patch doesn't make much use of this abstraction, but
forthcoming patches will, and it's overall a cleaner approach.
This didn't seem to have much of an effect by itself on
`batch_and_prepare_binned_render_phase` time, but subsequent PRs
dependent on this PR yield roughly a 2× speedup.
# Objective
Fix panic in `custom_render_phase`.
This example was broken by #17764, but that breakage evolved into a
panic after #17849. This new panic seems to illustrate the problem in a
pretty straightforward way.
```
2025-02-15T00:44:11.833622Z INFO bevy_diagnostic::system_information_diagnostics_plugin::internal: SystemInfo { os: "macOS 15.3 Sequoia", kernel: "24.3.0", cpu: "Apple M4 Max", core_count: "16", memory: "64.0 GiB" }
2025-02-15T00:44:11.908328Z INFO bevy_render::renderer: AdapterInfo { name: "Apple M4 Max", vendor: 0, device: 0, device_type: IntegratedGpu, driver: "", driver_info: "", backend: Metal }
2025-02-15T00:44:12.314930Z INFO bevy_winit::system: Creating new window App (0v1)
thread 'Compute Task Pool (1)' panicked at /Users/me/src/bevy/crates/bevy_ecs/src/system/function_system.rs:216:28:
bevy_render::batching::gpu_preprocessing::batch_and_prepare_sorted_render_phase<custom_render_phase::Stencil3d, custom_render_phase::StencilPipeline> could not access system parameter ResMut<PhaseBatchedInstanceBuffers<Stencil3d, MeshUniform>>
```
## Solution
Add a `SortedRenderPhasePlugin` for the custom phase.
## Testing
`cargo run --example custom_render_phase`
Currently, invocations of `batch_and_prepare_binned_render_phase` and
`batch_and_prepare_sorted_render_phase` can't run in parallel because
they write to scene-global GPU buffers. After PR #17698,
`batch_and_prepare_binned_render_phase` started accounting for the
lion's share of the CPU time, causing us to be strongly CPU bound on
scenes like Caldera when occlusion culling was on (because of the
overhead of batching for the Z-prepass). Although I eventually plan to
optimize `batch_and_prepare_binned_render_phase`, we can obtain
significant wins now by parallelizing that system across phases.
This commit splits all GPU buffers that
`batch_and_prepare_binned_render_phase` and
`batch_and_prepare_sorted_render_phase` touches into separate buffers
for each phase so that the scheduler will run those phases in parallel.
At the end of batch preparation, we gather the render phases up into a
single resource with a new *collection* phase. Because we already run
mesh preprocessing separately for each phase in order to make occlusion
culling work, this is actually a cleaner separation. For example, mesh
output indices (the unique ID that identifies each mesh instance on GPU)
are now guaranteed to be sequential starting from 0, which will simplify
the forthcoming work to remove them in favor of the compute dispatch ID.
On Caldera, this brings the frame time down to approximately 9.1 ms with
occlusion culling on.

Currently, we look up each `MeshInputUniform` index in a hash table that
maps the main entity ID to the index every frame. This is inefficient,
cache unfriendly, and unnecessary, as the `MeshInputUniform` index for
an entity remains the same from frame to frame (even if the input
uniform changes). This commit changes the `IndexSet` in the `RenderBin`
to an `IndexMap` that maps the `MainEntity` to `MeshInputUniformIndex`
(a new type that this patch adds for more type safety).
On Caldera with parallel `batch_and_prepare_binned_render_phase`, this
patch improves that function from 3.18 ms to 2.42 ms, a 31% speedup.
# Objective
Because of mesh preprocessing, users cannot rely on
`@builtin(instance_index)` in order to reference external data, as the
instance index is not stable, either from frame to frame or relative to
the total spawn order of mesh instances.
## Solution
Add a user supplied mesh index that can be used for referencing external
data when drawing instanced meshes.
Closes#13373
## Testing
Benchmarked `many_cubes` showing no difference in total frame time.
## Showcase
https://github.com/user-attachments/assets/80620147-aafc-4d9d-a8ee-e2149f7c8f3b
---------
Co-authored-by: IceSentry <IceSentry@users.noreply.github.com>
# Objective
- It's currently very hard for beginners and advanced users to get a
full understanding of a complete render phase.
## Solution
- Implement a full custom render phase
- The render phase in the example is intended to show a custom stencil
phase that renders the stencil in red directly on the screen
---
## Showcase
<img width="1277" alt="image"
src="https://github.com/user-attachments/assets/e9dc0105-4fb6-463f-ad53-0529b575fd28"
/>
## Notes
More docs to explain what is going on is still needed but the example
works and can already help some people.
We might want to consider using a batched phase and cold specialization
in the future, but the example is already complex enough as it is.
---------
Co-authored-by: Christopher Biscardi <chris@christopherbiscardi.com>