Commit Graph

214 Commits

Author SHA1 Message Date
charlotte
a4640046fc
Adds ShaderStorageBuffer asset (#14663)
Adds a new `Handle<Storage>` asset type that can be used as a render
asset, particularly for use with `AsBindGroup`.

Closes: #13658 

# Objective

Allow users to create storage buffers in the main world without having
to access the `RenderDevice`. While this resource is technically
available, it's bad form to use in the main world and requires mixing
rendering details with main world code. Additionally, this makes storage
buffers easier to use with `AsBindGroup`, particularly in the following
scenarios:
- Sharing the same buffers between a compute stage and material shader.
We already have examples of this for storage textures (see game of life
example) and these changes allow a similar pattern to be used with
storage buffers.
- Preventing repeated gpu upload (see the previous easier to use `Vec`
`AsBindGroup` option).
- Allow initializing custom materials using `Default`. Previously, the
lack of a `Default` implement for the raw `wgpu::Buffer` type made
implementing a `AsBindGroup + Default` bound difficult in the presence
of buffers.

## Solution

Adds a new `Handle<Storage>` asset type that is prepared into a
`GpuStorageBuffer` render asset. This asset can either be initialized
with a `Vec<u8>` of properly aligned data or with a size hint. Users can
modify the underlying `wgpu::BufferDescriptor` to provide additional
usage flags.

## Migration Guide

The `AsBindGroup` `storage` attribute has been modified to reference the
new `Handle<Storage>` asset instead. Usages of Vec` should be converted
into assets instead.

---------

Co-authored-by: IceSentry <IceSentry@users.noreply.github.com>
2024-09-02 16:46:34 +00:00
Alice Cecile
a2fc9de16d
Add RenderSet::FinalCleanup for World::clear_entities (#14764)
# Objective

`World::clear_entities` is ambiguous with all of the other systems in
`RenderSet::Cleanup` because it access `&mut World`.

## Solution

I've added another system set variant, and made sure that this runs
after everything else.
 
## Testing

The `ambiguity_detection` example

## Migration Guide

`World::clear_entities` is now part of `RenderSet::PostCleanup` rather
than `RenderSet::Cleanup`. Your cleanup systems should likely stay in
`RenderSet::Cleanup`.

## Additional context

Spotted when working on #7386: this was responsible for a large number
of ambiguities.

This should be removed if / when #14449 is merged: there's no need to
call `clear_entities` at all if the rendering world is retained!
2024-08-15 18:48:43 +00:00
Tau Gärtli
aab1f8e435
Use #[doc(fake_variadic)] to improve docs readability (#14703)
# Objective

- Fixes #14697

## Solution

This PR modifies the existing `all_tuples!` macro to optionally accept a
`#[doc(fake_variadic)]` attribute in its input. If the attribute is
present, each invocation of the impl macro gets the correct attributes
(i.e. the first impl receives `#[doc(fake_variadic)]` while the other
impls are hidden using `#[doc(hidden)]`.
Impls for the empty tuple (unit type) are left untouched (that's what
the [standard
library](https://doc.rust-lang.org/std/cmp/trait.PartialEq.html#impl-PartialEq-for-())
and
[serde](https://docs.rs/serde/latest/serde/trait.Serialize.html#impl-Serialize-for-())
do).

To work around https://github.com/rust-lang/cargo/issues/8811 and to get
impls on re-exports to correctly show up as variadic, `--cfg docsrs_dep`
is passed when building the docs for the toplevel `bevy` crate.

`#[doc(fake_variadic)]` only works on tuples and fn pointers, so impls
for structs like `AnyOf<(T1, T2, ..., Tn)>` are unchanged.

## Testing

I built the docs locally using `RUSTDOCFLAGS='--cfg docsrs'
RUSTFLAGS='--cfg docsrs_dep' cargo +nightly doc --no-deps --workspace`
and checked the documentation page of a trait both in its original crate
and the re-exported version in `bevy`.
The description should correctly mention for how many tuple items the
trait is implemented.

I added `rustc-args` for docs.rs to the `bevy` crate, I hope there
aren't any other notable crates that re-export `#[doc(fake_variadic)]`
traits.

---

## Showcase

`bevy_ecs::query::QueryData`:
<img width="1015" alt="Screenshot 2024-08-12 at 16 41 28"
src="https://github.com/user-attachments/assets/d40136ed-6731-475f-91a0-9df255cd24e3">

`bevy::ecs::query::QueryData` (re-export):
<img width="1005" alt="Screenshot 2024-08-12 at 16 42 57"
src="https://github.com/user-attachments/assets/71d44cf0-0ab0-48b0-9a51-5ce332594e12">

## Original Description

<details>

Resolves #14697

Submitting as a draft for now, very WIP.

Unfortunately, the docs don't show the variadics nicely when looking at
reexported items.
For example:

`bevy_ecs::bundle::Bundle` correctly shows the variadic impl:

![image](https://github.com/user-attachments/assets/90bf8af1-1d1f-4714-9143-cdd3d0199998)

while `bevy::ecs::bundle::Bundle` (the reexport) shows all the impls
(not good):

![image](https://github.com/user-attachments/assets/439c428e-f712-465b-bec2-481f7bf5870b)

Built using `RUSTDOCFLAGS='--cfg docsrs' cargo +nightly doc --workspace
--no-deps` (`--no-deps` because of wgpu-core).

Maybe I missed something or this is a limitation in the *totally not
private* `#[doc(fake_variadic)]` thingy. In any case I desperately need
some sleep now :))

</details>
2024-08-12 18:54:33 +00:00
Patrick Walton
bc34216929
Pack multiple vertex and index arrays together into growable buffers. (#14257)
This commit uses the [`offset-allocator`] crate to combine vertex and
index arrays from different meshes into single buffers. Since the
primary source of `wgpu` overhead is from validation and synchronization
when switching buffers, this significantly improves Bevy's rendering
performance on many scenes.

This patch is a more flexible version of #13218, which also used slabs.
Unlike #13218, which used slabs of a fixed size, this commit implements
slabs that start small and can grow. In addition to reducing memory
usage, supporting slab growth reduces the number of vertex and index
buffer switches that need to happen during rendering, leading to
improved performance. To prevent pathological fragmentation behavior,
slabs are capped to a maximum size, and mesh arrays that are too large
get their own dedicated slabs.

As an additional improvement over #13218, this commit allows the
application to customize all allocator heuristics. The
`MeshAllocatorSettings` resource contains values that adjust the minimum
and maximum slab sizes, the cutoff point at which meshes get their own
dedicated slabs, and the rate at which slabs grow. Hopefully-sensible
defaults have been chosen for each value.

Unfortunately, WebGL 2 doesn't support the *base vertex* feature, which
is necessary to pack vertex arrays from different meshes into the same
buffer. `wgpu` represents this restriction as the downlevel flag
`BASE_VERTEX`. This patch detects that bit and ensures that all vertex
buffers get dedicated slabs on that platform. Even on WebGL 2, though,
we can combine all *index* arrays into single buffers to reduce buffer
changes, and we do so.

The following measurements are on Bistro:

Overall frame time improves from 8.74 ms to 5.53 ms (1.58x speedup):
![Screenshot 2024-07-09
163521](https://github.com/bevyengine/bevy/assets/157897/5d83c824-c0ee-434c-bbaf-218ff7212c48)

Render system time improves from 6.57 ms to 3.54 ms (1.86x speedup):
![Screenshot 2024-07-09
163559](https://github.com/bevyengine/bevy/assets/157897/d94e2273-c3a0-496a-9f88-20d394129610)

Opaque pass time improves from 4.64 ms to 2.33 ms (1.99x speedup):
![Screenshot 2024-07-09
163536](https://github.com/bevyengine/bevy/assets/157897/e4ef6e48-d60e-44ae-9a71-b9a731c99d9a)

## Migration Guide

### Changed

* Vertex and index buffers for meshes may now be packed alongside other
buffers, for performance.
* `GpuMesh` has been renamed to `RenderMesh`, to reflect the fact that
it no longer directly stores handles to GPU objects.
* Because meshes no longer have their own vertex and index buffers, the
responsibility for the buffers has moved from `GpuMesh` (now called
`RenderMesh`) to the `MeshAllocator` resource. To access the vertex data
for a mesh, use `MeshAllocator::mesh_vertex_slice`. To access the index
data for a mesh, use `MeshAllocator::mesh_index_slice`.

[`offset-allocator`]: https://github.com/pcwalton/offset-allocator
2024-07-16 20:33:15 +00:00
Pietro
061bee7e3c
fix: upgrade to winit v0.30 (#13366)
# Objective

- Upgrade winit to v0.30
- Fixes https://github.com/bevyengine/bevy/issues/13331

## Solution

This is a rewrite/adaptation of the new trait system described and
implemented in `winit` v0.30.

## Migration Guide

The custom UserEvent is now renamed as WakeUp, used to wake up the loop
if anything happens outside the app (a new
[custom_user_event](https://github.com/bevyengine/bevy/pull/13366/files#diff-2de8c0a8d3028d0059a3d80ae31b2bbc1cde2595ce2d317ea378fe3e0cf6ef2d)
shows this behavior.

The internal `UpdateState` has been removed and replaced internally by
the AppLifecycle. When changed, the AppLifecycle is sent as an event.

The `UpdateMode` now accepts only two values: `Continuous` and
`Reactive`, but the latter exposes 3 new properties to enable reactive
to device, user or window events. The previous `UpdateMode::Reactive` is
now equivalent to `UpdateMode::reactive()`, while
`UpdateMode::ReactiveLowPower` to `UpdateMode::reactive_low_power()`.

The `ApplicationLifecycle` has been renamed as `AppLifecycle`, and now
contains the possible values of the application state inside the event
loop:
* `Idle`: the loop has not started yet
* `Running` (previously called `Started`): the loop is running
* `WillSuspend`: the loop is going to be suspended
* `Suspended`: the loop is suspended
* `WillResume`: the loop is going to be resumed

Note: the `Resumed` state has been removed since the resumed app is just
running.

Finally, now that `winit` enables this, it extends the `WinitPlugin` to
support custom events.

## Test platforms

- [x] Windows
- [x] MacOs
- [x] Linux (x11)
- [x] Linux (Wayland)
- [x] Android
- [x] iOS
- [x] WASM/WebGPU
- [x] WASM/WebGL2

## Outstanding issues / regressions

- [ ] iOS: build failed in CI
   - blocking, but may just be flakiness
- [x] Cross-platform: when the window is maximised, changes in the scale
factor don't apply, to make them apply one has to make the window
smaller again. (Re-maximising keeps the updated scale factor)
    - non-blocking, but good to fix
- [ ] Android: it's pretty easy to quickly open and close the app and
then the music keeps playing when suspended.
    - non-blocking but worrying
- [ ]  Web: the application will hang when switching tabs
- Not new, duplicate of https://github.com/bevyengine/bevy/issues/13486
- [ ] Cross-platform?: Screenshot failure, `ERROR present_frames:
wgpu_core::present: No work has been submitted for this frame before`
taking the first screenshot, but after pressing space
    - non-blocking, but good to fix

---------

Co-authored-by: François <francois.mockers@vleue.com>
2024-06-03 13:06:48 +00:00
Lynn
450a9202d0
Common MeshBuilder trait (#13411)
# Objective

- All `ShapeMeshBuilder`s have some methods/implementations in common.
These are `fn build(&self) -> Mesh` and this implementation:
```rust
impl From<ShapeMeshBuilder> for Mesh { 
    fn from(builder: ShapeMeshBuilder) -> { 
        builder.build() 
    } 
}
``` 

- For the sake of consistency, these can be moved into a shared trait

## Solution

- Add `trait MeshBuilder` containing a `fn build(&self) -> Mesh` and
implementing `MeshBuilder for ShapeMeshBuilder`
- Implement `From<T: MeshBuilder> for Mesh`

## Migration Guide

- When calling `.build()` you need to import
`bevy_render::mesh::primitives::MeshBuilder`
2024-05-18 11:58:11 +00:00
andristarr
bb76a2c69c
multi_threaded feature rename (#12997)
# Objective

Fixes #12966

## Solution

Renaming multi_threaded feature to match snake case

## Migration Guide

Bevy feature multi-threaded should be refered to multi_threaded from now
on.
2024-05-06 20:49:32 +00:00
stinkytoe
ec418aa429
Re-export IntoDynamicImageError as public (#13223)
# Objective

in response to [13222](https://github.com/bevyengine/bevy/issues/13222)

## Solution

The Image trait was already re-exported in bevy_render/src/lib.rs, So I
added it inline there.

## Testing

Confirmed that it does compile. Simple change, shouldn't cause any
bugs/regressions.
2024-05-04 13:13:49 +00:00
arcashka
6027890a11
move wgsl color operations from bevy_pbr to bevy_render (#13209)
# Objective

`bevy_pbr/utils.wgsl` shader file contains mathematical constants and
color conversion functions. Both of those should be accessible without
enabling `bevy_pbr` feature. For example, tonemapping can be done in non
pbr scenario, and it uses color conversion functions.

Fixes #13207

## Solution

* Move mathematical constants (such as PI, E) from
`bevy_pbr/src/render/utils.wgsl` into `bevy_render/src/maths.wgsl`
* Move color conversion functions from `bevy_pbr/src/render/utils.wgsl`
into new file `bevy_render/src/color_operations.wgsl`

## Testing
Ran multiple examples, checked they are working:
* tonemapping
* color_grading
* 3d_scene
* animated_material
* deferred_rendering
* 3d_shapes
* fog
* irradiance_volumes
* meshlet
* parallax_mapping
* pbr
* reflection_probes
* shadow_biases
* 2d_gizmos
* light_gizmos
---

## Changelog
* Moved mathematical constants (such as PI, E) from
`bevy_pbr/src/render/utils.wgsl` into `bevy_render/src/maths.wgsl`
* Moved color conversion functions from `bevy_pbr/src/render/utils.wgsl`
into new file `bevy_render/src/color_operations.wgsl`

## Migration Guide
In user's shader code replace usage of mathematical constants from
`bevy_pbr::utils` to the usage of the same constants from
`bevy_render::maths`.
2024-05-04 10:30:23 +00:00
Patrick Walton
16531fb3e3
Implement GPU frustum culling. (#12889)
This commit implements opt-in GPU frustum culling, built on top of the
infrastructure in https://github.com/bevyengine/bevy/pull/12773. To
enable it on a camera, add the `GpuCulling` component to it. To
additionally disable CPU frustum culling, add the `NoCpuCulling`
component. Note that adding `GpuCulling` without `NoCpuCulling`
*currently* does nothing useful. The reason why `GpuCulling` doesn't
automatically imply `NoCpuCulling` is that I intend to follow this patch
up with GPU two-phase occlusion culling, and CPU frustum culling plus
GPU occlusion culling seems like a very commonly-desired mode.

Adding the `GpuCulling` component to a view puts that view into
*indirect mode*. This mode makes all drawcalls indirect, relying on the
mesh preprocessing shader to allocate instances dynamically. In indirect
mode, the `PreprocessWorkItem` `output_index` points not to a
`MeshUniform` instance slot but instead to a set of `wgpu`
`IndirectParameters`, from which it allocates an instance slot
dynamically if frustum culling succeeds. Batch building has been updated
to allocate and track indirect parameter slots, and the AABBs are now
supplied to the GPU as `MeshCullingData`.

A small amount of code relating to the frustum culling has been borrowed
from meshlets and moved into `maths.wgsl`. Note that standard Bevy
frustum culling uses AABBs, while meshlets use bounding spheres; this
means that not as much code can be shared as one might think.

This patch doesn't provide any way to perform GPU culling on shadow
maps, to avoid making this patch bigger than it already is. That can be
a followup.

## Changelog

### Added

* Frustum culling can now optionally be done on the GPU. To enable it,
add the `GpuCulling` component to a camera.
* To disable CPU frustum culling, add `NoCpuCulling` to a camera. Note
that `GpuCulling` doesn't automatically imply `NoCpuCulling`.
2024-04-28 12:50:00 +00:00
robtfm
91a393a9e2
Throttle render assets (#12622)
# Objective

allow throttling of gpu uploads to prevent choppy framerate when many
textures/meshes are loaded in.

## Solution

- `RenderAsset`s can implement `byte_len()` which reports their size.
implemented this for `Mesh` and `Image`
- users can add a `RenderAssetBytesPerFrame` which specifies max bytes
to attempt to upload in a frame
- `render_assets::<A>` checks how many bytes have been written before
attempting to upload assets. the limit is a soft cap: assets will be
written until the total has exceeded the cap, to ensure some forward
progress every frame

notes:
- this is a stopgap until we have multiple wgpu queues for proper
streaming of data
- requires #12606

issues
- ~~fonts sometimes only partially upload. i have no clue why, needs to
be fixed~~ fixed now.
- choosing the #bytes is tricky as it should be hardware / framerate
dependent
- many features are not tested (env maps, light probes, etc) - they
won't break unless `RenderAssetBytesPerFrame` is explicitly used though

---------

Co-authored-by: IceSentry <IceSentry@users.noreply.github.com>
Co-authored-by: François Mockers <francois.mockers@vleue.com>
2024-04-26 23:43:33 +00:00
BD103
7b8d502083
Fix beta lints (#12980)
# Objective

- Fixes #12976

## Solution

This one is a doozy.

- Run `cargo +beta clippy --workspace --all-targets --all-features` and
fix all issues
- This includes:
- Moving inner attributes to be outer attributes, when the item in
question has both inner and outer attributes
  - Use `ptr::from_ref` in more scenarios
- Extend the valid idents list used by `clippy:doc_markdown` with more
names
  - Use `Clone::clone_from` when possible
  - Remove redundant `ron` import
  - Add backticks to **so many** identifiers and items
    - I'm sorry whoever has to review this

---

## Changelog

- Added links to more identifiers in documentation.
2024-04-16 02:46:46 +00:00
Robert Swain
ab7cbfa8fc
Consolidate Render(Ui)Materials(2d) into RenderAssets (#12827)
# Objective

- Replace `RenderMaterials` / `RenderMaterials2d` / `RenderUiMaterials`
with `RenderAssets` to enable implementing changes to one thing,
`RenderAssets`, that applies to all use cases rather than duplicating
changes everywhere for multiple things that should be one thing.
- Adopts #8149 

## Solution

- Make RenderAsset generic over the destination type rather than the
source type as in #8149
- Use `RenderAssets<PreparedMaterial<M>>` etc for render materials

---

## Changelog

- Changed:
- The `RenderAsset` trait is now implemented on the destination type.
Its `SourceAsset` associated type refers to the type of the source
asset.
- `RenderMaterials`, `RenderMaterials2d`, and `RenderUiMaterials` have
been replaced by `RenderAssets<PreparedMaterial<M>>` and similar.

## Migration Guide

- `RenderAsset` is now implemented for the destination type rather that
the source asset type. The source asset type is now the `RenderAsset`
trait's `SourceAsset` associated type.
2024-04-09 13:26:34 +00:00
François Mockers
93fd02e8ea
remove DeterministicRenderingConfig (#12811)
# Objective

- Since #12453, `DeterministicRenderingConfig` doesn't do anything

## Solution

- Remove it

---

## Migration Guide

- Removed `DeterministicRenderingConfig`. There shouldn't be any z
fighting anymore in the rendering even without setting
`stable_sort_z_fighting`
2024-04-01 09:32:47 +00:00
Cameron
01649f13e2
Refactor App and SubApp internals for better separation (#9202)
# Objective

This is a necessary precursor to #9122 (this was split from that PR to
reduce the amount of code to review all at once).

Moving `!Send` resource ownership to `App` will make it unambiguously
`!Send`. `SubApp` must be `Send`, so it can't wrap `App`.

## Solution

Refactor `App` and `SubApp` to not have a recursive relationship. Since
`SubApp` no longer wraps `App`, once `!Send` resources are moved out of
`World` and into `App`, `SubApp` will become unambiguously `Send`.

There could be less code duplication between `App` and `SubApp`, but
that would break `App` method chaining.

## Changelog

- `SubApp` no longer wraps `App`.
- `App` fields are no longer publicly accessible.
- `App` can no longer be converted into a `SubApp`.
- Various methods now return references to a `SubApp` instead of an
`App`.
## Migration Guide

- To construct a sub-app, use `SubApp::new()`. `App` can no longer
convert into `SubApp`.
- If you implemented a trait for `App`, you may want to implement it for
`SubApp` as well.
- If you're accessing `app.world` directly, you now have to use
`app.world()` and `app.world_mut()`.
- `App::sub_app` now returns `&SubApp`.
- `App::sub_app_mut`  now returns `&mut SubApp`.
- `App::get_sub_app` now returns `Option<&SubApp>.`
- `App::get_sub_app_mut` now returns `Option<&mut SubApp>.`
2024-03-31 03:16:10 +00:00
Patrick Walton
4dadebd9c4
Improve performance by binning together opaque items instead of sorting them. (#12453)
Today, we sort all entities added to all phases, even the phases that
don't strictly need sorting, such as the opaque and shadow phases. This
results in a performance loss because our `PhaseItem`s are rather large
in memory, so sorting is slow. Additionally, determining the boundaries
of batches is an O(n) process.

This commit makes Bevy instead applicable place phase items into *bins*
keyed by *bin keys*, which have the invariant that everything in the
same bin is potentially batchable. This makes determining batch
boundaries O(1), because everything in the same bin can be batched.
Instead of sorting each entity, we now sort only the bin keys. This
drops the sorting time to near-zero on workloads with few bins like
`many_cubes --no-frustum-culling`. Memory usage is improved too, with
batch boundaries and dynamic indices now implicit instead of explicit.
The improved memory usage results in a significant win even on
unbatchable workloads like `many_cubes --no-frustum-culling
--vary-material-data-per-instance`, presumably due to cache effects.

Not all phases can be binned; some, such as transparent and transmissive
phases, must still be sorted. To handle this, this commit splits
`PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the
logic that today deals with `PhaseItem`s has been moved to
`SortedPhaseItem`. `BinnedPhaseItem` has the new logic.

Frame time results (in ms/frame) are as follows:

| Benchmark                | `binning` | `main`  | Speedup |
| ------------------------ | --------- | ------- | ------- |
| `many_cubes -nfc -vpi` | 232.179     | 312.123   | 34.43%  |
| `many_cubes -nfc`        | 25.874 | 30.117 | 16.40%  |
| `many_foxes`             | 3.276 | 3.515 | 7.30%   |

(`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for
`--vary-per-instance`.)

---

## Changelog

### Changed

* Render phases have been split into binned and sorted phases. Binned
phases, such as the common opaque phase, achieve improved CPU
performance by avoiding the sorting step.

## Migration Guide

- `PhaseItem` has been split into `BinnedPhaseItem` and
`SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need
to migrate them to one of these two types. `SortedPhaseItem` requires
the fewest code changes, but you may want to pick `BinnedPhaseItem` if
your phase doesn't require sorting, as that enables higher performance.

## Tracy graphs

`many-cubes --no-frustum-culling`, `main` branch:
<img width="1064" alt="Screenshot 2024-03-12 180037"
src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55">

`many-cubes --no-frustum-culling`, this branch:
<img width="1064" alt="Screenshot 2024-03-12 180011"
src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c">

You can see that `batch_and_prepare_binned_render_phase` is a much
smaller fraction of the time. Zooming in on that function, with yellow
being this branch and red being `main`, we see:

<img width="1064" alt="Screenshot 2024-03-12 175832"
src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0">

The binning happens in `queue_material_meshes`. Again with yellow being
this branch and red being `main`:
<img width="1064" alt="Screenshot 2024-03-12 175755"
src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96">

We can see that there is a small regression in `queue_material_meshes`
performance, but it's not nearly enough to outweigh the large gains in
`batch_and_prepare_binned_render_phase`.

---------

Co-authored-by: James Liu <contact@jamessliu.com>
2024-03-30 02:55:02 +00:00
James Liu
56bcbb0975
Forbid unsafe in most crates in the engine (#12684)
# Objective
Resolves #3824. `unsafe` code should be the exception, not the norm in
Rust. It's obviously needed for various use cases as it's interfacing
with platforms and essentially running the borrow checker at runtime in
the ECS, but the touted benefits of Bevy is that we are able to heavily
leverage Rust's safety, and we should be holding ourselves accountable
to that by minimizing our unsafe footprint.

## Solution
Deny `unsafe_code` workspace wide. Add explicit exceptions for the
following crates, and forbid it in almost all of the others.

* bevy_ecs - Obvious given how much unsafe is needed to achieve
performant results
* bevy_ptr - Works with raw pointers, even more low level than bevy_ecs.
 * bevy_render - due to needing to integrate with wgpu
 * bevy_window - due to needing to integrate with raw_window_handle
* bevy_utils - Several unsafe utilities used by bevy_ecs. Ideally moved
into bevy_ecs instead of made publicly usable.
 * bevy_reflect - Required for the unsafe type casting it's doing.
 * bevy_transform - for the parallel transform propagation
 * bevy_gizmos  - For the SystemParam impls it has.
* bevy_assets - To support reflection. Might not be required, not 100%
sure yet.
* bevy_mikktspace - due to being a conversion from a C library. Pending
safe rewrite.
* bevy_dynamic_plugin - Inherently unsafe due to the dynamic loading
nature.

Several uses of unsafe were rewritten, as they did not need to be using
them:

* bevy_text - a case of `Option::unchecked` could be rewritten as a
normal for loop and match instead of an iterator.
* bevy_color - the Pod/Zeroable implementations were replaceable with
bytemuck's derive macros.
2024-03-27 03:30:08 +00:00
Ian Kettlewell
b35974010b
Get Bevy building for WebAssembly with multithreading (#12205)
# Objective

This gets Bevy building on Wasm when the `atomics` flag is enabled. This
does not yet multithread Bevy itself, but it allows Bevy users to use a
crate like `wasm_thread` to spawn their own threads and manually
parallelize work. This is a first step towards resolving #4078 . Also
fixes #9304.

This provides a foothold so that Bevy contributors can begin to think
about multithreaded Wasm's constraints and Bevy can work towards changes
to get the engine itself multithreaded.

Some flags need to be set on the Rust compiler when compiling for Wasm
multithreading. Here's what my build script looks like, with the correct
flags set, to test out Bevy examples on web:

```bash
set -e
RUSTFLAGS='-C target-feature=+atomics,+bulk-memory,+mutable-globals' \
     cargo build --example breakout --target wasm32-unknown-unknown -Z build-std=std,panic_abort --release
 wasm-bindgen --out-name wasm_example \
   --out-dir examples/wasm/target \
   --target web target/wasm32-unknown-unknown/release/examples/breakout.wasm
 devserver --header Cross-Origin-Opener-Policy='same-origin' --header Cross-Origin-Embedder-Policy='require-corp' --path examples/wasm
```

A few notes:

1. `cpal` crashes immediately when the `atomics` flag is set. That is
patched in https://github.com/RustAudio/cpal/pull/837, but not yet in
the latest crates.io release.

That can be temporarily worked around by patching Cpal like so:
```toml
[patch.crates-io]
cpal = { git = "https://github.com/RustAudio/cpal" }
```

2. When testing out `wasm_thread` you need to enable the `es_modules`
feature.

## Solution

The largest obstacle to compiling Bevy with `atomics` on web is that
`wgpu` types are _not_ Send and Sync. Longer term Bevy will need an
approach to handle that, but in the near term Bevy is already configured
to be single-threaded on web.

Therefor it is enough to wrap `wgpu` types in a
`send_wrapper::SendWrapper` that _is_ Send / Sync, but panics if
accessed off the `wgpu` thread.

---

## Changelog

- `wgpu` types that are not `Send` are wrapped in
`send_wrapper::SendWrapper` on Wasm + 'atomics'
- CommandBuffers are not generated in parallel on Wasm + 'atomics'

## Questions
- Bevy should probably add CI checks to make sure this doesn't regress.
Should that go in this PR or a separate PR? **Edit:** Added checks to
build Wasm with atomics

---------

Co-authored-by: François <mockersf@gmail.com>
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: daxpedda <daxpedda@gmail.com>
Co-authored-by: François <francois.mockers@vleue.com>
2024-03-25 19:10:18 +00:00
James Liu
f096ad4155
Set the logo and favicon for all of Bevy's published crates (#12696)
# Objective
Currently the built docs only shows the logo and favicon for the top
level `bevy` crate. This makes views like
https://docs.rs/bevy_ecs/latest/bevy_ecs/ look potentially unrelated to
the project at first glance.

## Solution
Reproduce the docs attributes for every crate that Bevy publishes.

Ideally this would be done with some workspace level Cargo.toml control,
but AFAICT, such support does not exist.
2024-03-25 18:52:50 +00:00
JMS55
93b4c6c9a2
Add iOS to synchronous_pipeline_compilation docs (#12694)
iOS uses Metal, so it has the same limitation as macOS, presumably.
2024-03-24 22:01:55 +00:00
LeshaInc
737b719dda
Add pipeline statistics (#9135)
# Objective

It's useful to have access to render pipeline statistics, since they
provide more information than FPS alone. For example, the number of
drawn triangles can be used to debug culling and LODs. The number of
fragment shader invocations can provide a more stable alternative metric
than GPU elapsed time.

See also: Render node GPU timing overlay #8067, which doesn't provide
pipeline statistics, but adds a nice overlay.

## Solution

Add `RenderDiagnosticsPlugin`, which enables collecting pipeline
statistics and CPU & GPU timings.

---

## Changelog

- Add `RenderDiagnosticsPlugin`
- Add `RenderContext::diagnostic_recorder` method

---------

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
2024-03-17 20:29:35 +00:00
Al M
52e3f2007b
Add "all-features = true" to docs.rs metadata for most crates (#12366)
# Objective

Fix missing `TextBundle` (and many others) which are present in the main
crate as default features but optional in the sub-crate. See:

- https://docs.rs/bevy/0.13.0/bevy/ui/node_bundles/index.html
- https://docs.rs/bevy_ui/0.13.0/bevy_ui/node_bundles/index.html

~~There are probably other instances in other crates that I could track
down, but maybe "all-features = true" should be used by default in all
sub-crates? Not sure.~~ (There were many.) I only noticed this because
rust-analyzer's "open docs" features takes me to the sub-crate, not the
main one.

## Solution

Add "all-features = true" to docs.rs metadata for crates that use
features.

## Changelog

### Changed

- Unified features documented on docs.rs between main crate and
sub-crates
2024-03-08 20:03:09 +00:00
James Liu
9e5db9abc7
Clean up type registrations (#12314)
# Objective
Fix #12304. Remove unnecessary type registrations thanks to #4154.

## Solution
Conservatively remove type registrations. Keeping the top level
components, resources, and events, but dropping everything else that is
a type of a member of those types.
2024-03-06 16:05:53 +00:00
Alice Cecile
599e5e4e76
Migrate from LegacyColor to bevy_color::Color (#12163)
# Objective

- As part of the migration process we need to a) see the end effect of
the migration on user ergonomics b) check for serious perf regressions
c) actually migrate the code
- To accomplish this, I'm going to attempt to migrate all of the
remaining user-facing usages of `LegacyColor` in one PR, being careful
to keep a clean commit history.
- Fixes #12056.

## Solution

I've chosen to use the polymorphic `Color` type as our standard
user-facing API.

- [x] Migrate `bevy_gizmos`.
- [x] Take `impl Into<Color>` in all `bevy_gizmos` APIs
- [x] Migrate sprites
- [x] Migrate UI
- [x] Migrate `ColorMaterial`
- [x] Migrate `MaterialMesh2D`
- [x] Migrate fog
- [x] Migrate lights
- [x] Migrate StandardMaterial
- [x] Migrate wireframes
- [x] Migrate clear color
- [x] Migrate text
- [x] Migrate gltf loader
- [x] Register color types for reflection
- [x] Remove `LegacyColor`
- [x] Make sure CI passes

Incidental improvements to ease migration:

- added `Color::srgba_u8`, `Color::srgba_from_array` and friends
- added `set_alpha`, `is_fully_transparent` and `is_fully_opaque` to the
`Alpha` trait
- add and immediately deprecate (lol) `Color::rgb` and friends in favor
of more explicit and consistent `Color::srgb`
- standardized on white and black for most example text colors
- added vector field traits to `LinearRgba`: ~~`Add`, `Sub`,
`AddAssign`, `SubAssign`,~~ `Mul<f32>` and `Div<f32>`. Multiplications
and divisions do not scale alpha. `Add` and `Sub` have been cut from
this PR.
- added `LinearRgba` and `Srgba` `RED/GREEN/BLUE`
- added `LinearRgba_to_f32_array` and `LinearRgba::to_u32`

## Migration Guide

Bevy's color types have changed! Wherever you used a
`bevy::render::Color`, a `bevy::color::Color` is used instead.

These are quite similar! Both are enums storing a color in a specific
color space (or to be more precise, using a specific color model).
However, each of the different color models now has its own type.

TODO...

- `Color::rgba`, `Color::rgb`, `Color::rbga_u8`, `Color::rgb_u8`,
`Color::rgb_from_array` are now `Color::srgba`, `Color::srgb`,
`Color::srgba_u8`, `Color::srgb_u8` and `Color::srgb_from_array`.
- `Color::set_a` and `Color::a` is now `Color::set_alpha` and
`Color::alpha`. These are part of the `Alpha` trait in `bevy_color`.
- `Color::is_fully_transparent` is now part of the `Alpha` trait in
`bevy_color`
- `Color::r`, `Color::set_r`, `Color::with_r` and the equivalents for
`g`, `b` `h`, `s` and `l` have been removed due to causing silent
relatively expensive conversions. Convert your `Color` into the desired
color space, perform your operations there, and then convert it back
into a polymorphic `Color` enum.
- `Color::hex` is now `Srgba::hex`. Call `.into` or construct a
`Color::Srgba` variant manually to convert it.
- `WireframeMaterial`, `ExtractedUiNode`, `ExtractedDirectionalLight`,
`ExtractedPointLight`, `ExtractedSpotLight` and `ExtractedSprite` now
store a `LinearRgba`, rather than a polymorphic `Color`
- `Color::rgb_linear` and `Color::rgba_linear` are now
`Color::linear_rgb` and `Color::linear_rgba`
- The various CSS color constants are no longer stored directly on
`Color`. Instead, they're defined in the `Srgba` color space, and
accessed via `bevy::color::palettes::css`. Call `.into()` on them to
convert them into a `Color` for quick debugging use, and consider using
the much prettier `tailwind` palette for prototyping.
- The `LIME_GREEN` color has been renamed to `LIMEGREEN` to comply with
the standard naming.
- Vector field arithmetic operations on `Color` (add, subtract, multiply
and divide by a f32) have been removed. Instead, convert your colors
into `LinearRgba` space, and perform your operations explicitly there.
This is particularly relevant when working with emissive or HDR colors,
whose color channel values are routinely outside of the ordinary 0 to 1
range.
- `Color::as_linear_rgba_f32` has been removed. Call
`LinearRgba::to_f32_array` instead, converting if needed.
- `Color::as_linear_rgba_u32` has been removed. Call
`LinearRgba::to_u32` instead, converting if needed.
- Several other color conversion methods to transform LCH or HSL colors
into float arrays or `Vec` types have been removed. Please reimplement
these externally or open a PR to re-add them if you found them
particularly useful.
- Various methods on `Color` such as `rgb` or `hsl` to convert the color
into a specific color space have been removed. Convert into
`LinearRgba`, then to the color space of your choice.
- Various implicitly-converting color value methods on `Color` such as
`r`, `g`, `b` or `h` have been removed. Please convert it into the color
space of your choice, then check these properties.
- `Color` no longer implements `AsBindGroup`. Store a `LinearRgba`
internally instead to avoid conversion costs.

---------

Co-authored-by: Alice Cecile <alice.i.cecil@gmail.com>
Co-authored-by: Afonso Lage <lage.afonso@gmail.com>
Co-authored-by: Rob Parrett <robparrett@gmail.com>
Co-authored-by: Zachary Harrold <zac@harrold.com.au>
2024-02-29 19:35:12 +00:00
Alice Cecile
de004da8d5
Rename bevy_render::Color to LegacyColor (#12069)
# Objective

The migration process for `bevy_color` (#12013) will be fairly involved:
there will be hundreds of affected files, and a large number of APIs.

## Solution

To allow us to proceed granularly, we're going to keep both
`bevy_color::Color` (new) and `bevy_render::Color` (old) around until
the migration is complete.

However, simply doing this directly is confusing! They're both called
`Color`, making it very hard to tell when a portion of the code has been
ported.

As discussed in #12056, by renaming the old `Color` type, we can make it
easier to gradually migrate over, one API at a time.

## Migration Guide

THIS MIGRATION GUIDE INTENTIONALLY LEFT BLANK.

This change should not be shipped to end users: delete this section in
the final migration guide!

---------

Co-authored-by: Alice Cecile <alice.i.cecil@gmail.com>
2024-02-24 21:35:32 +00:00
Sam Pettersson
caa7ec68d4
FIX: iOS Simulator not rendering due to missing CUBE_ARRAY_TEXTURES (#12052)
This PR closes #11978

# Objective

Fix rendering on iOS Simulators.

iOS Simulator doesn't support the capability CUBE_ARRAY_TEXTURES, since
0.13 this started to make iOS Simulator not render anything with the
following message being outputted:

```
2024-02-19T14:59:34.896266Z ERROR bevy_render::render_resource::pipeline_cache: failed to create shader module: Validation Error

Caused by:
    In Device::create_shader_module
    
Shader validation error: 


    Type [40] '' is invalid
    Capability Capabilities(CUBE_ARRAY_TEXTURES) is required
```

## Solution

- Split up NO_ARRAY_TEXTURES_SUPPORT into both NO_ARRAY_TEXTURES_SUPPORT
and NO_CUBE_ARRAY_TEXTURES_SUPPORT and correctly apply
NO_ARRAY_TEXTURES_SUPPORT for iOS Simulator using the cfg flag
introduced in #10178.

---

## Changelog

### Fixed
- Rendering on iOS Simulator due to missing CUBE_ARRAY_TEXTURES support.

---------

Co-authored-by: Sam Pettersson <sam.pettersson@geoguessr.com>
2024-02-23 01:24:59 +00:00
Matty
328008f904
Move AlphaMode into bevy_render (#12012)
# Objective

- Closes #11985

## Solution

- alpha.rs has been moved from bevy_pbr into bevy_render; bevy_pbr and
bevy_gltf now access `AlphaMode` through bevy_render.

---

## Migration Guide

In the present implementation, external consumers of `AlphaMode` will
have to access it through bevy_render rather than through bevy_pbr,
changing their import from `bevy_pbr::AlphaMode` to
`bevy_render::alpha::AlphaMode` (or the corresponding glob import from
`bevy_pbr::prelude::*` to `bevy_render::prelude::*`).

## Uncertainties

Some remaining things from this that I am uncertain about:
- Here, the `app.register_type<AlphaMode>()` call has been moved from
`PbrPlugin` to `RenderPlugin`; I'm not sure if this is quite right, and
I was unable to find any direct relationship between `PbrPlugin` and
`RenderPlugin`.
- `AlphaMode` was placed in the prelude of bevy_render. I'm not certain
that this is actually appropriate.
- bevy_pbr does not re-export `AlphaMode`, which makes this a breaking
change for external consumers.

Any of these things could be easily changed; I'm just not confident that
I necessarily adopted the right approach in these (known) ways since
this codebase and ecosystem is quite new to me.
2024-02-21 19:34:10 +00:00
James Liu
bc82749012
Remove APIs deprecated in 0.13 (#11974)
# Objective
We deprecated quite a few APIs in 0.13. 0.13 has shipped already. It
should be OK to remove them in 0.14's release. Fixes #4059. Fixes #9011.

## Solution
Remove them.
2024-02-19 19:04:47 +00:00
Rob Parrett
ebaa347afe
Add configuration for async pipeline creation on RenderPlugin (#11847)
# Objective

Fixes #11846

## Solution

Add a `synchronous_pipeline_compilation ` field to `RenderPlugin`,
defaulting to `false`.

Most of the diff is whitespace.

## Changelog

Added `synchronous_pipeline_compilation ` to `RenderPlugin` for
disabling async pipeline creation.

## Migration Guide

TODO: consider combining this with the guide for #11846

`RenderPlugin` has a new `synchronous_pipeline_compilation ` property.
The default value is `false`. Set this to `true` if you want to retain
the previous synchronous behavior.

---------

Co-authored-by: JMS55 <47158642+JMS55@users.noreply.github.com>
Co-authored-by: François <mockersf@gmail.com>
2024-02-16 13:35:47 +00:00
Tristan Guichaoua
694c06f3d0
Inverse missing_docs logic (#11676)
# Objective

Currently the `missing_docs` lint is allowed-by-default and enabled at
crate level when their documentations is complete (see #3492).
This PR proposes to inverse this logic by making `missing_docs`
warn-by-default and mark crates with imcomplete docs allowed.

## Solution

Makes `missing_docs` warn at workspace level and allowed at crate level
when the docs is imcomplete.
2024-02-03 21:40:55 +00:00
Mike
a919cb0a17
Don't auto insert on the extract schedule (#11669)
# Objective

- In #9822 I forgot to disable auto sync points on the Extract Schedule.
We want to do this because the Commands on the Extract Schedule should
be applied on the render thread.
2024-02-03 05:04:57 +00:00
Lixou
16d28ccb91
RenderGraph Labelization (#10644)
# Objective

The whole `Cow<'static, str>` naming for nodes and subgraphs in
`RenderGraph` is a mess.

## Solution

Replaces hardcoded and potentially overlapping strings for nodes and
subgraphs inside `RenderGraph` with bevy's labelsystem.

---

## Changelog

* Two new labels: `RenderLabel` and `RenderSubGraph`.
* Replaced all uses for hardcoded strings with those labels
* Moved `Taa` label from its own mod to all the other `Labels3d`
* `add_render_graph_edges` now needs a tuple of labels
* Moved `ScreenSpaceAmbientOcclusion` label from its own mod with the
`ShadowPass` label to `LabelsPbr`
* Removed  `NodeId`
* Renamed `Edges.id()` to `Edges.label()`
* Removed `NodeLabel`
* Changed examples according to the new label system
* Introduced new `RenderLabel`s: `Labels2d`, `Labels3d`, `LabelsPbr`,
`LabelsUi`
* Introduced new `RenderSubGraph`s: `SubGraph2d`, `SubGraph3d`,
`SubGraphUi`
* Removed `Reflect` and `Default` derive from `CameraRenderGraph`
component struct
* Improved some error messages

## Migration Guide

For Nodes and SubGraphs, instead of using hardcoded strings, you now
pass labels, which can be derived with structs and enums.

```rs
// old
#[derive(Default)]
struct MyRenderNode;
impl MyRenderNode {
    pub const NAME: &'static str = "my_render_node"
}

render_app
    .add_render_graph_node::<ViewNodeRunner<MyRenderNode>>(
        core_3d::graph::NAME,
        MyRenderNode::NAME,
    )
    .add_render_graph_edges(
        core_3d::graph::NAME,
        &[
            core_3d::graph::node::TONEMAPPING,
            MyRenderNode::NAME,
            core_3d::graph::node::END_MAIN_PASS_POST_PROCESSING,
        ],
    );

// new
use bevy::core_pipeline::core_3d::graph::{Labels3d, SubGraph3d};

#[derive(Debug, Hash, PartialEq, Eq, Clone, RenderLabel)]
pub struct MyRenderLabel;

#[derive(Default)]
struct MyRenderNode;

render_app
    .add_render_graph_node::<ViewNodeRunner<MyRenderNode>>(
        SubGraph3d,
        MyRenderLabel,
    )
    .add_render_graph_edges(
        SubGraph3d,
        (
            Labels3d::Tonemapping,
            MyRenderLabel,
            Labels3d::EndMainPassPostProcessing,
        ),
    );
```

### SubGraphs

#### in `bevy_core_pipeline::core_2d::graph`
| old string-based path | new label |
|-----------------------|-----------|
| `NAME` | `SubGraph2d` |

#### in `bevy_core_pipeline::core_3d::graph`
| old string-based path | new label |
|-----------------------|-----------|
| `NAME` | `SubGraph3d` |

#### in `bevy_ui::render`
| old string-based path | new label |
|-----------------------|-----------|
| `draw_ui_graph::NAME` | `graph::SubGraphUi` |

### Nodes

#### in `bevy_core_pipeline::core_2d::graph`
| old string-based path | new label |
|-----------------------|-----------|
| `node::MSAA_WRITEBACK` | `Labels2d::MsaaWriteback` | 
| `node::MAIN_PASS` | `Labels2d::MainPass` | 
| `node::BLOOM` | `Labels2d::Bloom` | 
| `node::TONEMAPPING` | `Labels2d::Tonemapping` | 
| `node::FXAA` | `Labels2d::Fxaa` | 
| `node::UPSCALING` | `Labels2d::Upscaling` | 
| `node::CONTRAST_ADAPTIVE_SHARPENING` |
`Labels2d::ConstrastAdaptiveSharpening` |
| `node::END_MAIN_PASS_POST_PROCESSING` |
`Labels2d::EndMainPassPostProcessing` |

#### in `bevy_core_pipeline::core_3d::graph`
| old string-based path | new label |
|-----------------------|-----------|
| `node::MSAA_WRITEBACK` | `Labels3d::MsaaWriteback` | 
| `node::PREPASS` | `Labels3d::Prepass` | 
| `node::DEFERRED_PREPASS` | `Labels3d::DeferredPrepass` | 
| `node::COPY_DEFERRED_LIGHTING_ID` | `Labels3d::CopyDeferredLightingId`
|
| `node::END_PREPASSES` | `Labels3d::EndPrepasses` | 
| `node::START_MAIN_PASS` | `Labels3d::StartMainPass` | 
| `node::MAIN_OPAQUE_PASS` | `Labels3d::MainOpaquePass` | 
| `node::MAIN_TRANSMISSIVE_PASS` | `Labels3d::MainTransmissivePass` | 
| `node::MAIN_TRANSPARENT_PASS` | `Labels3d::MainTransparentPass` | 
| `node::END_MAIN_PASS` | `Labels3d::EndMainPass` | 
| `node::BLOOM` | `Labels3d::Bloom` | 
| `node::TONEMAPPING` | `Labels3d::Tonemapping` | 
| `node::FXAA` | `Labels3d::Fxaa` | 
| `node::UPSCALING` | `Labels3d::Upscaling` | 
| `node::CONTRAST_ADAPTIVE_SHARPENING` |
`Labels3d::ContrastAdaptiveSharpening` |
| `node::END_MAIN_PASS_POST_PROCESSING` |
`Labels3d::EndMainPassPostProcessing` |

#### in `bevy_core_pipeline`
| old string-based path | new label |
|-----------------------|-----------|
| `taa::draw_3d_graph::node::TAA` | `Labels3d::Taa` |

#### in `bevy_pbr`
| old string-based path | new label |
|-----------------------|-----------|
| `draw_3d_graph::node::SHADOW_PASS` | `LabelsPbr::ShadowPass` |
| `ssao::draw_3d_graph::node::SCREEN_SPACE_AMBIENT_OCCLUSION` |
`LabelsPbr::ScreenSpaceAmbientOcclusion` |
| `deferred::DEFFERED_LIGHTING_PASS` | `LabelsPbr::DeferredLightingPass`
|

#### in `bevy_render`
| old string-based path | new label |
|-----------------------|-----------|
| `main_graph::node::CAMERA_DRIVER` | `graph::CameraDriverLabel` |

#### in `bevy_ui::render`
| old string-based path | new label |
|-----------------------|-----------|
| `draw_ui_graph::node::UI_PASS` | `graph::LabelsUi::UiPass` |

---

## Future work

* Make `NodeSlot`s also use types. Ideally, we have an enum with unit
variants where every variant resembles one slot. Then to make sure you
are using the right slot enum and make rust-analyzer play nicely with
it, we should make an associated type in the `Node` trait. With today's
system, we can introduce 3rd party slots to a node, and i wasnt sure if
this was used, so I didn't do this in this PR.

## Unresolved Questions

When looking at the `post_processing` example, we have a struct for the
label and a struct for the node, this seems like boilerplate and on
discord, @IceSentry (sowy for the ping)
[asked](https://discord.com/channels/691052431525675048/743663924229963868/1175197016947699742)
if a node could automatically introduce a label (or i completely
misunderstood that). The problem with that is, that nodes like
`EmptyNode` exist multiple times *inside the same* (sub)graph, so there
we need extern labels to distinguish between those. Hopefully we can
find a way to reduce boilerplate and still have everything unique. For
EmptyNode, we could maybe make a macro which implements an "empty node"
for a type, but for nodes which contain code and need to be present
multiple times, this could get nasty...
2024-01-31 14:51:19 +00:00
Joona Aalto
2bf481c03b
Add Meshable trait and implement meshing for 2D primitives (#11431)
# Objective

The first part of #10569, split up from #11007.

The goal is to implement meshing support for Bevy's new geometric
primitives, starting with 2D primitives. 3D meshing will be added in a
follow-up, and we can consider removing the old mesh shapes completely.

## Solution

Add a `Meshable` trait that primitives need to implement to support
meshing, as suggested by the
[RFC](https://github.com/bevyengine/rfcs/blob/main/rfcs/12-primitive-shapes.md#meshing).

```rust
/// A trait for shapes that can be turned into a [`Mesh`].
pub trait Meshable {
    /// The output of [`Self::mesh`]. This can either be a [`Mesh`]
    /// or a builder used for creating a [`Mesh`].
    type Output;

    /// Creates a [`Mesh`] for a shape.
    fn mesh(&self) -> Self::Output;
}
```

This PR implements it for the following primitives:

- `Circle`
- `Ellipse`
- `Rectangle`
- `RegularPolygon`
- `Triangle2d`

The `mesh` method typically returns a builder-like struct such as
`CircleMeshBuilder`. This is needed to support shape-specific
configuration for things like mesh resolution or UV configuration:

```rust
meshes.add(Circle { radius: 0.5 }.mesh().resolution(64));
```

Note that if no configuration is needed, you can even skip calling
`mesh` because `From<MyPrimitive>` is implemented for `Mesh`:

```rust
meshes.add(Circle { radius: 0.5 });
```

I also updated the `2d_shapes` example to use primitives, and tweaked
the colors to have better contrast against the dark background.

Before:

![Old 2D
shapes](https://github.com/bevyengine/bevy/assets/57632562/f1d8c2d5-55be-495f-8ed4-5890154b81ca)

After:

![New 2D
shapes](https://github.com/bevyengine/bevy/assets/57632562/f166c013-34b8-4752-800a-5517b284d978)

Here you can see the UVs and different facing directions: (taken from
#11007, so excuse the 3D primitives at the bottom left)

![UVs and facing
directions](https://github.com/bevyengine/bevy/assets/57632562/eaf0be4e-187d-4b6d-8fb8-c996ba295a8a)

---

## Changelog

- Added `bevy_render::mesh::primitives` module
- Added `Meshable` trait and implemented it for:
  - `Circle`
  - `Ellipse`
  - `Rectangle`
  - `RegularPolygon`
  - `Triangle2d`
- Implemented `Default` and `Copy` for several 2D primitives
- Updated `2d_shapes` example to use primitives
- Tweaked colors in `2d_shapes` example to have better contrast against
the (new-ish) dark background

---------

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
2024-01-29 16:47:47 +00:00
Elabajaba
35ac1b152e
Update to wgpu 0.19 and raw-window-handle 0.6 (#11280)
# Objective

Keep core dependencies up to date.

## Solution

Update the dependencies.

wgpu 0.19 only supports raw-window-handle (rwh) 0.6, so bumping that was
included in this.

The rwh 0.6 version bump is just the simplest way of doing it. There
might be a way we can take advantage of wgpu's new safe surface creation
api, but I'm not familiar enough with bevy's window management to
untangle it and my attempt ended up being a mess of lifetimes and rustc
complaining about missing trait impls (that were implemented). Thanks to
@MiniaczQ for the (much simpler) rwh 0.6 version bump code.

Unblocks https://github.com/bevyengine/bevy/pull/9172 and
https://github.com/bevyengine/bevy/pull/10812

~~This might be blocked on cpal and oboe updating their ndk versions to
0.8, as they both currently target ndk 0.7 which uses rwh 0.5.2~~ Tested
on android, and everything seems to work correctly (audio properly stops
when minimized, and plays when re-focusing the app).

---

## Changelog

- `wgpu` has been updated to 0.19! The long awaited arcanization has
been merged (for more info, see
https://gfx-rs.github.io/2023/11/24/arcanization.html), and Vulkan
should now be working again on Intel GPUs.
- Targeting WebGPU now requires that you add the new `webgpu` feature
(setting the `RUSTFLAGS` environment variable to
`--cfg=web_sys_unstable_apis` is still required). This feature currently
overrides the `webgl2` feature if you have both enabled (the `webgl2`
feature is enabled by default), so it is not recommended to add it as a
default feature to libraries without putting it behind a flag that
allows library users to opt out of it! In the future we plan on
supporting wasm binaries that can target both webgl2 and webgpu now that
wgpu added support for doing so (see
https://github.com/bevyengine/bevy/issues/11505).
- `raw-window-handle` has been updated to version 0.6.

## Migration Guide

- `bevy_render::instance_index::get_instance_index()` has been removed
as the webgl2 workaround is no longer required as it was fixed upstream
in wgpu. The `BASE_INSTANCE_WORKAROUND` shaderdef has also been removed.
- WebGPU now requires the new `webgpu` feature to be enabled. The
`webgpu` feature currently overrides the `webgl2` feature so you no
longer need to disable all default features and re-add them all when
targeting `webgpu`, but binaries built with both the `webgpu` and
`webgl2` features will only target the webgpu backend, and will only
work on browsers that support WebGPU.
- Places where you conditionally compiled things for webgl2 need to be
updated because of this change, eg:
- `#[cfg(any(not(feature = "webgl"), not(target_arch = "wasm32")))]`
becomes `#[cfg(any(not(feature = "webgl") ,not(target_arch = "wasm32"),
feature = "webgpu"))]`
- `#[cfg(all(feature = "webgl", target_arch = "wasm32"))]` becomes
`#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature =
"webgpu")))]`
- `if cfg!(all(feature = "webgl", target_arch = "wasm32"))` becomes `if
cfg!(all(feature = "webgl", target_arch = "wasm32", not(feature =
"webgpu")))`
- `create_texture_with_data` now also takes a `TextureDataOrder`. You
can probably just set this to `TextureDataOrder::default()`
- `TextureFormat`'s `block_size` has been renamed to `block_copy_size`
- See the `wgpu` changelog for anything I might've missed:
https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md

---------

Co-authored-by: François <mockersf@gmail.com>
2024-01-26 18:14:21 +00:00
Mike
ee9a1503ed
Async channel v2 (#10692)
# Objective

- Update async channel to v2.

## Solution

- async channel doesn't support `send_blocking` on wasm anymore. So
don't compile the pipelined rendering plugin on wasm anymore.
- Replaces https://github.com/bevyengine/bevy/pull/10405

## Migration Guide
- The `PipelinedRendering` plugin is no longer exported on wasm. If you
are including it in your wasm builds you should remove it.

```rust
#[cfg(all(not(target_arch = "wasm32"))]
app.add_plugins(bevy_render::pipelined_rendering::PipelinedRenderingPlugin);
```

---------

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
2024-01-15 19:23:00 +00:00
Stepan Koltsov
06bf928927
Option to enable deterministic rendering (#11248)
# Objective

Issue #10243: rendering multiple triangles in the same place results in
flickering.

## Solution

Considered these alternatives:
- `depth_bias` may not work, because of high number of entities, so
creating a material per entity is practically not possible
- rendering at slightly different positions does not work, because when
camera is far, float rounding causes the same issues (edit: assuming we
have to use the same `depth_bias`)
- considered implementing deterministic operation like
`query.par_iter().flat_map(...).collect()` to be used in
`check_visibility` system (which would solve the issue since query is
deterministic), and could not figure out how to make it as cheap as
current approach with thread-local collectors (#11249)

So adding an option to sort entities after `check_visibility` system
run.

Should not be too bad, because after visibility check, only a handful
entities remain.

This is probably not the only source of non-determinism in Bevy, but
this is one I could find so far. At least it fixes the repro example.

## Changelog

- `DeterministicRenderingConfig` option to enable deterministic
rendering

## Test

<img width="1392" alt="image"
src="https://github.com/bevyengine/bevy/assets/28969/c735bce1-3a71-44cd-8677-c19f6c0ee6bd">

---------

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
2024-01-09 00:46:01 +00:00
Stepan Koltsov
dfa1a5e547
Explain where rendering is (#11018)
It was not easy to find. Add some pointers to the comment.
2024-01-08 23:02:46 +00:00
JMS55
70b0eacc3b
Keep track of when a texture is first cleared (#10325)
# Objective
- Custom render passes, or future passes in the engine (such as
https://github.com/bevyengine/bevy/pull/10164) need a better way to know
and indicate to the core passes whether the view color/depth/prepass
attachments have been cleared or not yet this frame, to know if they
should clear it themselves or load it.

## Solution

- For all render targets (depth textures, shadow textures, prepass
textures, main textures) use an atomic bool to track whether or not each
texture has been cleared this frame. Abstracted away in the new
ColorAttachment and DepthAttachment wrappers.

---

## Changelog
- Changed `ViewTarget::get_color_attachment()`, removed arguments.
- Changed `ViewTarget::get_unsampled_color_attachment()`, removed
arguments.
- Removed `Camera3d::clear_color`.
- Removed `Camera2d::clear_color`.
- Added `Camera::clear_color`.
- Added `ExtractedCamera::clear_color`.
- Added `ColorAttachment` and `DepthAttachment` wrappers.
- Moved `ClearColor` and `ClearColorConfig` from
`bevy::core_pipeline::clear_color` to `bevy::render::camera`.
- Core render passes now track when a texture is first bound as an
attachment in order to decide whether to clear or load it.

## Migration Guide
- Remove arguments to `ViewTarget::get_color_attachment()` and
`ViewTarget::get_unsampled_color_attachment()`.
- Configure clear color on `Camera` instead of on `Camera3d` and
`Camera2d`.
- Moved `ClearColor` and `ClearColorConfig` from
`bevy::core_pipeline::clear_color` to `bevy::render::camera`.
- `ViewDepthTexture` must now be created via the `new()` method

---------

Co-authored-by: vero <email@atlasdostal.com>
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
2023-12-31 00:37:37 +00:00
Mike
6b84ba97a3
Auto insert sync points (#9822)
# Objective

- Users are often confused when their command effects are not visible in
the next system. This PR auto inserts sync points if there are deferred
buffers on a system and there are dependents on that system (systems
with after relationships).
- Manual sync points can lead to users adding more than needed and it's
hard for the user to have a global understanding of their system graph
to know which sync points can be merged. However we can easily calculate
which sync points can be merged automatically.

## Solution

1. Add new edge types to allow opting out of new behavior
2. Insert an sync point for each edge whose initial node has deferred
system params.
3. Reuse nodes if they're at the number of sync points away.

* add opt outs for specific edges with `after_ignore_deferred`,
`before_ignore_deferred` and `chain_ignore_deferred`. The
`auto_insert_apply_deferred` boolean on `ScheduleBuildSettings` can be
set to false to opt out for the whole schedule.

## Perf
This has a small negative effect on schedule build times.
```text
group                                           auto-sync                              main-for-auto-sync
-----                                           -----------                            ------------------
build_schedule/1000_schedule                    1.06       2.8±0.15s        ? ?/sec    1.00       2.7±0.06s        ? ?/sec
build_schedule/1000_schedule_noconstraints      1.01     26.2±0.88ms        ? ?/sec    1.00     25.8±0.36ms        ? ?/sec
build_schedule/100_schedule                     1.02     13.1±0.33ms        ? ?/sec    1.00     12.9±0.28ms        ? ?/sec
build_schedule/100_schedule_noconstraints       1.08   505.3±29.30µs        ? ?/sec    1.00   469.4±12.48µs        ? ?/sec
build_schedule/500_schedule                     1.00    485.5±6.29ms        ? ?/sec    1.00    485.5±9.80ms        ? ?/sec
build_schedule/500_schedule_noconstraints       1.00      6.8±0.10ms        ? ?/sec    1.02      6.9±0.16ms        ? ?/sec
```
---

## Changelog

- Auto insert sync points and added `after_ignore_deferred`,
`before_ignore_deferred`, `chain_no_deferred` and
`auto_insert_apply_deferred` APIs to opt out of this behavior

## Migration Guide

- `apply_deferred` points are added automatically when there is ordering
relationship with a system that has deferred parameters like `Commands`.
If you want to opt out of this you can switch from `after`, `before`,
and `chain` to the corresponding `ignore_deferred` API,
`after_ignore_deferred`, `before_ignore_deferred` or
`chain_ignore_deferred` for your system/set ordering.
- You can also set `ScheduleBuildSettings::auto_insert_sync_points` to
`false` if you want to do it for the whole schedule. Note that in this
mode you can still add `apply_deferred` points manually.
- For most manual insertions of `apply_deferred` you should remove them
as they cannot be merged with the automatically inserted points and
might reduce parallelizability of the system graph.

## TODO
- [x] remove any apply_deferred used in the engine
- [x] ~~decide if we should deprecate manually using apply_deferred.~~
We'll still allow inserting manual sync points for now for whatever edge
cases users might have.
- [x] Update migration guide
- [x] rerun schedule build benchmarks

---------

Co-authored-by: Joseph <21144246+JoJoJet@users.noreply.github.com>
2023-12-14 16:34:01 +00:00
Elabajaba
70a592f31a
Update to wgpu 0.18 (#10266)
# Objective

Keep up to date with wgpu.

## Solution

Update the wgpu version.

Currently blocked on naga_oil updating to naga 0.14 and releasing a new
version.

3d scenes (or maybe any scene with lighting?) currently don't render
anything due to
```
error: naga_oil bug, please file a report: composer failed to build a valid header: Type [2] '' is invalid
 = Capability Capabilities(CUBE_ARRAY_TEXTURES) is required
 ```

I'm not sure what should be passed in for `wgpu::InstanceFlags`, or if we want to make the gles3minorversion configurable (might be useful for debugging?)

Currently blocked on https://github.com/bevyengine/naga_oil/pull/63, and https://github.com/gfx-rs/wgpu/issues/4569 to be fixed upstream in wgpu first.

## Known issues

Amd+windows+vulkan has issues with texture_binding_arrays (see the image [here](https://github.com/bevyengine/bevy/pull/10266#issuecomment-1819946278)), but that'll be fixed in the next wgpu/naga version, and you can just use dx12 as a workaround for now (Amd+linux mesa+vulkan texture_binding_arrays are fixed though).

---

## Changelog

Updated wgpu to 0.18, naga to 0.14.2, and naga_oil to 0.11.
- Windows desktop GL should now be less painful as it no longer requires Angle.
- You can now toggle shader validation and debug information for debug and release builds using `WgpuSettings.instance_flags` and [InstanceFlags](https://docs.rs/wgpu/0.18.0/wgpu/struct.InstanceFlags.html)

## Migration Guide

- `RenderPassDescriptor` `color_attachments`  (as well as `RenderPassColorAttachment`, and `RenderPassDepthStencilAttachment`) now use `StoreOp::Store` or `StoreOp::Discard` instead of a `boolean` to declare whether or not they should be stored.
- `RenderPassDescriptor` now have `timestamp_writes` and `occlusion_query_set` fields. These can safely be set to `None`.
- `ComputePassDescriptor` now have a `timestamp_writes` field. This can be set to `None` for now.
- See the [wgpu changelog](https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md#v0180-2023-10-25) for additional details
2023-12-14 02:45:47 +00:00
tygyh
fd308571c4
Remove unnecessary path prefixes (#10749)
# Objective

- Shorten paths by removing unnecessary prefixes

## Solution

- Remove the prefixes from many paths which do not need them. Finding
the paths was done automatically using built-in refactoring tools in
Jetbrains RustRover.
2023-11-28 23:43:40 +00:00
TheBigCheese
e67cfdf82b
Enable clippy::undocumented_unsafe_blocks warning across the workspace (#10646)
# Objective

Enables warning on `clippy::undocumented_unsafe_blocks` across the
workspace rather than only in `bevy_ecs`, `bevy_transform` and
`bevy_utils`. This adds a little awkwardness in a few areas of code that
have trivial safety or explain safety for multiple unsafe blocks with
one comment however automatically prevents these comments from being
missed.

## Solution

This adds `undocumented_unsafe_blocks = "warn"` to the workspace
`Cargo.toml` and fixes / adds a few missed safety comments. I also added
`#[allow(clippy::undocumented_unsafe_blocks)]` where the safety is
explained somewhere above.

There are a couple of safety comments I added I'm not 100% sure about in
`bevy_animation` and `bevy_render/src/view` and I'm not sure about the
use of `#[allow(clippy::undocumented_unsafe_blocks)]` compared to adding
comments like `// SAFETY: See above`.
2023-11-21 02:06:24 +00:00
Ame
951c9bb1a2
Add [lints] table, fix adding #![allow(clippy::type_complexity)] everywhere (#10011)
# Objective

- Fix adding `#![allow(clippy::type_complexity)]` everywhere. like #9796

## Solution

- Use the new [lints] table that will land in 1.74
(https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#lints)
- inherit lint to the workspace, crates and examples.
```
[lints]
workspace = true
```

## Changelog

- Bump rust version to 1.74
- Enable lints table for the workspace
```toml
[workspace.lints.clippy]
type_complexity = "allow"
```
- Allow type complexity for all crates and examples
```toml
[lints]
workspace = true
```

---------

Co-authored-by: Martín Maita <47983254+mnmaita@users.noreply.github.com>
2023-11-18 20:58:48 +00:00
Edgar Geier
a830530be4
Replace all labels with interned labels (#7762)
# Objective

First of all, this PR took heavy inspiration from #7760 and #5715. It
intends to also fix #5569, but with a slightly different approach.


This also fixes #9335 by reexporting `DynEq`.

## Solution

The advantage of this API is that we can intern a value without
allocating for zero-sized-types and for enum variants that have no
fields. This PR does this automatically in the `SystemSet` and
`ScheduleLabel` derive macros for unit structs and fieldless enum
variants. So this should cover many internal and external use cases of
`SystemSet` and `ScheduleLabel`. In these optimal use cases, no memory
will be allocated.

- The interning returns a `Interned<dyn SystemSet>`, which is just a
wrapper around a `&'static dyn SystemSet`.
- `Hash` and `Eq` are implemented in terms of the pointer value of the
reference, similar to my first approach of anonymous system sets in
#7676.
- Therefore, `Interned<T>` does not implement `Borrow<T>`, only `Deref`.
- The debug output of `Interned<T>` is the same as the interned value.

Edit: 
- `AppLabel` is now also interned and the old
`derive_label`/`define_label` macros were replaced with the new
interning implementation.
- Anonymous set ids are reused for different `Schedule`s, reducing the
amount of leaked memory.

### Pros
- `InternedSystemSet` and `InternedScheduleLabel` behave very similar to
the current `BoxedSystemSet` and `BoxedScheduleLabel`, but can be copied
without an allocation.
- Many use cases don't allocate at all.
- Very fast lookups and comparisons when using `InternedSystemSet` and
`InternedScheduleLabel`.
- The `intern` module might be usable in other areas.
- `Interned{ScheduleLabel, SystemSet, AppLabel}` does implement
`{ScheduleLabel, SystemSet, AppLabel}`, increasing ergonomics.

### Cons
- Implementors of `SystemSet` and `ScheduleLabel` still need to
implement `Hash` and `Eq` (and `Clone`) for it to work.

## Changelog

### Added

- Added `intern` module to `bevy_utils`.
- Added reexports of `DynEq` to `bevy_ecs` and `bevy_app`.

### Changed

- Replaced `BoxedSystemSet` and `BoxedScheduleLabel` with
`InternedSystemSet` and `InternedScheduleLabel`.
- Replaced `impl AsRef<dyn ScheduleLabel>` with `impl ScheduleLabel`.
- Replaced `AppLabelId` with `InternedAppLabel`.
- Changed `AppLabel` to use `Debug` for error messages.
- Changed `AppLabel` to use interning.
- Changed `define_label`/`derive_label` to use interning. 
- Replaced `define_boxed_label`/`derive_boxed_label` with
`define_label`/`derive_label`.
- Changed anonymous set ids to be only unique inside a schedule, not
globally.
- Made interned label types implement their label trait. 

### Removed

- Removed `define_boxed_label` and `derive_boxed_label`. 

## Migration guide

- Replace `BoxedScheduleLabel` and `Box<dyn ScheduleLabel>` with
`InternedScheduleLabel` or `Interned<dyn ScheduleLabel>`.
- Replace `BoxedSystemSet` and `Box<dyn SystemSet>` with
`InternedSystemSet` or `Interned<dyn SystemSet>`.
- Replace `AppLabelId` with `InternedAppLabel` or `Interned<dyn
AppLabel>`.
- Types manually implementing `ScheduleLabel`, `AppLabel` or `SystemSet`
need to implement:
  - `dyn_hash` directly instead of implementing `DynHash`
  - `as_dyn_eq`
- Pass labels to `World::try_schedule_scope`, `World::schedule_scope`,
`World::try_run_schedule`. `World::run_schedule`, `Schedules::remove`,
`Schedules::remove_entry`, `Schedules::contains`, `Schedules::get` and
`Schedules::get_mut` by value instead of by reference.

---------

Co-authored-by: Joseph <21144246+JoJoJet@users.noreply.github.com>
Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2023-10-25 21:39:23 +00:00
Bruce Mitchener
6bd3cca0ca
Improve linking within RenderSet docs. (#10143)
# Objective

- Improve formatting and linking within `RenderSet` docs.

## Solution

- Used backticks and intradoc links.
2023-10-16 17:13:20 +00:00
Jan Češpivo
4a61f894b7
chore: Renamed RenderInstance trait to ExtractInstance (#10065)
# Objective

Fixes [#10061]

## Solution

Renamed `RenderInstance` to `ExtractInstance`, `RenderInstances` to
`ExtractedInstances` and `RenderInstancePlugin` to
`ExtractInstancesPlugin`
2023-10-13 17:06:53 +00:00
Patrick Walton
e67d63aa79
Refactor the render instance logic in #9903 so that it's easier for other components to adopt. (#10002)
# Objective

Currently, the only way for custom components that participate in
rendering to opt into the higher-performance extraction method in #9903
is to implement the `RenderInstances` data structure and the extraction
logic manually. This is inconvenient compared to the `ExtractComponent`
API.

## Solution

This commit creates a new `RenderInstance` trait that mirrors the
existing `ExtractComponent` method but uses the higher-performance
approach that #9903 uses. Additionally, `RenderInstance` is more
flexible than `ExtractComponent`, because it can extract multiple
components at once. This makes high-performance rendering components
essentially as easy to write as the existing ones based on component
extraction.

---

## Changelog

### Added

A new `RenderInstance` trait is available mirroring `ExtractComponent`,
but using a higher-performance method to extract one or more components
to the render world. If you have custom components that rendering takes
into account, you may consider migration from `ExtractComponent` to
`RenderInstance` for higher performance.
2023-10-08 10:34:44 +00:00
Mike
687e379800
Updates for rust 1.73 (#10035)
# Objective

- Updates for rust 1.73

## Solution

- new doc check for `redundant_explicit_links`
- updated to text for compile fail tests

---

## Changelog

- updates for rust 1.73
2023-10-06 00:31:10 +00:00
piper
bc88f33e48
Allow other plugins to create renderer resources (#9925)
This is a duplicate of #9632, it was created since I forgot to make a
new branch when I first made this PR, so I was having trouble resolving
merge conflicts, meaning I had to rebuild my PR.

# Objective

- Allow other plugins to create the renderer resources. An example of
where this would be required is my [OpenXR
plugin](https://github.com/awtterpip/bevy_openxr)

## Solution

- Changed the bevy RenderPlugin to optionally take precreated render
resources instead of a configuration.

## Migration Guide

The `RenderPlugin` now takes a `RenderCreation` enum instead of
`WgpuSettings`. `RenderSettings::default()` returns
`RenderSettings::Automatic(WgpuSettings::default())`. `RenderSettings`
also implements `From<WgpuSettings>`.

```rust
// before
RenderPlugin {
    wgpu_settings: WgpuSettings {
    ...
    },
}

// now
RenderPlugin {
    render_creation: RenderCreation::Automatic(WgpuSettings {
    ...
    }),
}
// or
RenderPlugin {
    render_creation: WgpuSettings {
    ...
    }.into(),
}
```

---------

Co-authored-by: Malek <pocmalek@gmail.com>
Co-authored-by: Robert Swain <robert.swain@gmail.com>
2023-09-26 19:35:08 +00:00
Robert Swain
5c884c5a15
Automatic batching/instancing of draw commands (#9685)
# Objective

- Implement the foundations of automatic batching/instancing of draw
commands as the next step from #89
- NOTE: More performance improvements will come when more data is
managed and bound in ways that do not require rebinding such as mesh,
material, and texture data.

## Solution

- The core idea for batching of draw commands is to check whether any of
the information that has to be passed when encoding a draw command
changes between two things that are being drawn according to the sorted
render phase order. These should be things like the pipeline, bind
groups and their dynamic offsets, index/vertex buffers, and so on.
  - The following assumptions have been made:
- Only entities with prepared assets (pipelines, materials, meshes) are
queued to phases
- View bindings are constant across a phase for a given draw function as
phases are per-view
- `batch_and_prepare_render_phase` is the only system that performs this
batching and has sole responsibility for preparing the per-object data.
As such the mesh binding and dynamic offsets are assumed to only vary as
a result of the `batch_and_prepare_render_phase` system, e.g. due to
having to split data across separate uniform bindings within the same
buffer due to the maximum uniform buffer binding size.
- Implement `GpuArrayBuffer` for `Mesh2dUniform` to store Mesh2dUniform
in arrays in GPU buffers rather than each one being at a dynamic offset
in a uniform buffer. This is the same optimisation that was made for 3D
not long ago.
- Change batch size for a range in `PhaseItem`, adding API for getting
or mutating the range. This is more flexible than a size as the length
of the range can be used in place of the size, but the start and end can
be otherwise whatever is needed.
- Add an optional mesh bind group dynamic offset to `PhaseItem`. This
avoids having to do a massive table move just to insert
`GpuArrayBufferIndex` components.

## Benchmarks

All tests have been run on an M1 Max on AC power. `bevymark` and
`many_cubes` were modified to use 1920x1080 with a scale factor of 1. I
run a script that runs a separate Tracy capture process, and then runs
the bevy example with `--features bevy_ci_testing,trace_tracy` and
`CI_TESTING_CONFIG=../benchmark.ron` with the contents of
`../benchmark.ron`:
```rust
(
    exit_after: Some(1500)
)
```
...in order to run each test for 1500 frames.

The recent changes to `many_cubes` and `bevymark` added reproducible
random number generation so that with the same settings, the same rng
will occur. They also added benchmark modes that use a fixed delta time
for animations. Combined this means that the same frames should be
rendered both on main and on the branch.

The graphs compare main (yellow) to this PR (red).

### 3D Mesh `many_cubes --benchmark`

<img width="1411" alt="Screenshot 2023-09-03 at 23 42 10"
src="https://github.com/bevyengine/bevy/assets/302146/2088716a-c918-486c-8129-090b26fd2bc4">
The mesh and material are the same for all instances. This is basically
the best case for the initial batching implementation as it results in 1
draw for the ~11.7k visible meshes. It gives a ~30% reduction in median
frame time.

The 1000th frame is identical using the flip tool:

![flip many_cubes-main-mesh3d many_cubes-batching-mesh3d 67ppd
ldr](https://github.com/bevyengine/bevy/assets/302146/2511f37a-6df8-481a-932f-706ca4de7643)

```
     Mean: 0.000000
     Weighted median: 0.000000
     1st weighted quartile: 0.000000
     3rd weighted quartile: 0.000000
     Min: 0.000000
     Max: 0.000000
     Evaluation time: 0.4615 seconds
```

### 3D Mesh `many_cubes --benchmark --material-texture-count 10`

<img width="1404" alt="Screenshot 2023-09-03 at 23 45 18"
src="https://github.com/bevyengine/bevy/assets/302146/5ee9c447-5bd2-45c6-9706-ac5ff8916daf">
This run uses 10 different materials by varying their textures. The
materials are randomly selected, and there is no sorting by material
bind group for opaque 3D so any batching is 'random'. The PR produces a
~5% reduction in median frame time. If we were to sort the opaque phase
by the material bind group, then this should be a lot faster. This
produces about 10.5k draws for the 11.7k visible entities. This makes
sense as randomly selecting from 10 materials gives a chance that two
adjacent entities randomly select the same material and can be batched.

The 1000th frame is identical in flip:

![flip many_cubes-main-mesh3d-mtc10 many_cubes-batching-mesh3d-mtc10
67ppd
ldr](https://github.com/bevyengine/bevy/assets/302146/2b3a8614-9466-4ed8-b50c-d4aa71615dbb)

```
     Mean: 0.000000
     Weighted median: 0.000000
     1st weighted quartile: 0.000000
     3rd weighted quartile: 0.000000
     Min: 0.000000
     Max: 0.000000
     Evaluation time: 0.4537 seconds
```

### 3D Mesh `many_cubes --benchmark --vary-per-instance`

<img width="1394" alt="Screenshot 2023-09-03 at 23 48 44"
src="https://github.com/bevyengine/bevy/assets/302146/f02a816b-a444-4c18-a96a-63b5436f3b7f">
This run varies the material data per instance by randomly-generating
its colour. This is the worst case for batching and that it performs
about the same as `main` is a good thing as it demonstrates that the
batching has minimal overhead when dealing with ~11k visible mesh
entities.

The 1000th frame is identical according to flip:

![flip many_cubes-main-mesh3d-vpi many_cubes-batching-mesh3d-vpi 67ppd
ldr](https://github.com/bevyengine/bevy/assets/302146/ac5f5c14-9bda-4d1a-8219-7577d4aac68c)

```
     Mean: 0.000000
     Weighted median: 0.000000
     1st weighted quartile: 0.000000
     3rd weighted quartile: 0.000000
     Min: 0.000000
     Max: 0.000000
     Evaluation time: 0.4568 seconds
```

### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode
mesh2d`

<img width="1412" alt="Screenshot 2023-09-03 at 23 59 56"
src="https://github.com/bevyengine/bevy/assets/302146/cb02ae07-237b-4646-ae9f-fda4dafcbad4">
This spawns 160 waves of 1000 quad meshes that are shaded with
ColorMaterial. Each wave has a different material so 160 waves currently
should result in 160 batches. This results in a 50% reduction in median
frame time.

Capturing a screenshot of the 1000th frame main vs PR gives:

![flip bevymark-main-mesh2d bevymark-batching-mesh2d 67ppd
ldr](https://github.com/bevyengine/bevy/assets/302146/80102728-1217-4059-87af-14d05044df40)

```
     Mean: 0.001222
     Weighted median: 0.750432
     1st weighted quartile: 0.453494
     3rd weighted quartile: 0.969758
     Min: 0.000000
     Max: 0.990296
     Evaluation time: 0.4255 seconds
```

So they seem to produce the same results. I also double-checked the
number of draws. `main` does 160000 draws, and the PR does 160, as
expected.

### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode
mesh2d --material-texture-count 10`

<img width="1392" alt="Screenshot 2023-09-04 at 00 09 22"
src="https://github.com/bevyengine/bevy/assets/302146/4358da2e-ce32-4134-82df-3ab74c40849c">
This generates 10 textures and generates materials for each of those and
then selects one material per wave. The median frame time is reduced by
50%. Similar to the plain run above, this produces 160 draws on the PR
and 160000 on `main` and the 1000th frame is identical (ignoring the fps
counter text overlay).

![flip bevymark-main-mesh2d-mtc10 bevymark-batching-mesh2d-mtc10 67ppd
ldr](https://github.com/bevyengine/bevy/assets/302146/ebed2822-dce7-426a-858b-b77dc45b986f)

```
     Mean: 0.002877
     Weighted median: 0.964980
     1st weighted quartile: 0.668871
     3rd weighted quartile: 0.982749
     Min: 0.000000
     Max: 0.992377
     Evaluation time: 0.4301 seconds
```

### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode
mesh2d --vary-per-instance`

<img width="1396" alt="Screenshot 2023-09-04 at 00 13 53"
src="https://github.com/bevyengine/bevy/assets/302146/b2198b18-3439-47ad-919a-cdabe190facb">
This creates unique materials per instance by randomly-generating the
material's colour. This is the worst case for 2D batching. Somehow, this
PR manages a 7% reduction in median frame time. Both main and this PR
issue 160000 draws.

The 1000th frame is the same:

![flip bevymark-main-mesh2d-vpi bevymark-batching-mesh2d-vpi 67ppd
ldr](https://github.com/bevyengine/bevy/assets/302146/a2ec471c-f576-4a36-a23b-b24b22578b97)

```
     Mean: 0.001214
     Weighted median: 0.937499
     1st weighted quartile: 0.635467
     3rd weighted quartile: 0.979085
     Min: 0.000000
     Max: 0.988971
     Evaluation time: 0.4462 seconds
```

### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode
sprite`

<img width="1396" alt="Screenshot 2023-09-04 at 12 21 12"
src="https://github.com/bevyengine/bevy/assets/302146/8b31e915-d6be-4cac-abf5-c6a4da9c3d43">
This just spawns 160 waves of 1000 sprites. There should be and is no
notable difference between main and the PR.

### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode
sprite --material-texture-count 10`

<img width="1389" alt="Screenshot 2023-09-04 at 12 36 08"
src="https://github.com/bevyengine/bevy/assets/302146/45fe8d6d-c901-4062-a349-3693dd044413">
This spawns the sprites selecting a texture at random per instance from
the 10 generated textures. This has no significant change vs main and
shouldn't.

### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode
sprite --vary-per-instance`

<img width="1401" alt="Screenshot 2023-09-04 at 12 29 52"
src="https://github.com/bevyengine/bevy/assets/302146/762c5c60-352e-471f-8dbe-bbf10e24ebd6">
This sets the sprite colour as being unique per instance. This can still
all be drawn using one batch. There should be no difference but the PR
produces median frame times that are 4% higher. Investigation showed no
clear sources of cost, rather a mix of give and take that should not
happen. It seems like noise in the results.

### Summary

| Benchmark  | % change in median frame time |
| ------------- | ------------- |
| many_cubes  | 🟩 -30%  |
| many_cubes 10 materials  | 🟩 -5%  |
| many_cubes unique materials  | 🟩 ~0%  |
| bevymark mesh2d  | 🟩 -50%  |
| bevymark mesh2d 10 materials  | 🟩 -50%  |
| bevymark mesh2d unique materials  | 🟩 -7%  |
| bevymark sprite  | 🟥 2%  |
| bevymark sprite 10 materials  | 🟥 0.6%  |
| bevymark sprite unique materials  | 🟥 4.1%  |

---

## Changelog

- Added: 2D and 3D mesh entities that share the same mesh and material
(same textures, same data) are now batched into the same draw command
for better performance.

---------

Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>
Co-authored-by: Nicola Papale <nico@nicopap.ch>
2023-09-21 22:12:34 +00:00