Commit Graph

11 Commits

Author SHA1 Message Date
Erik Živković
72aaa41603
Remove render_resource_wrapper (#15441)
# Objective

* Remove all uses of render_resource_wrapper.
* Make it easier to share a `wgpu::Device` between Bevy and application
code.

## Solution

Removed the `render_resource_wrapper` macro.

To improve the `RenderCreation:: Manual ` API, `ErasedRenderDevice` was
replaced by `Arc`. Unfortunately I had to introduce one more usage of
`WgpuWrapper` which seems like an unwanted constraint on the caller.

## Testing

- Did you test these changes? If so, how?
    - Ran `cargo test`.
    - Ran a few examples.
    - Used `RenderCreation::Manual` in my own project
    - Exercised `RenderCreation::Automatic` through examples

- Are there any parts that need more testing?
    - No

- How can other people (reviewers) test your changes? Is there anything
specific they need to know?
    - Run examples
    - Use `RenderCreation::Manual` in their own project
2024-09-30 17:37:07 +00:00
Zachary Harrold
d70595b667
Add core and alloc over std Lints (#15281)
# Objective

- Fixes #6370
- Closes #6581

## Solution

- Added the following lints to the workspace:
  - `std_instead_of_core`
  - `std_instead_of_alloc`
  - `alloc_instead_of_core`
- Used `cargo +nightly fmt` with [item level use
formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Item%5C%3A)
to split all `use` statements into single items.
- Used `cargo clippy --workspace --all-targets --all-features --fix
--allow-dirty` to _attempt_ to resolve the new linting issues, and
intervened where the lint was unable to resolve the issue automatically
(usually due to needing an `extern crate alloc;` statement in a crate
root).
- Manually removed certain uses of `std` where negative feature gating
prevented `--all-features` from finding the offending uses.
- Used `cargo +nightly fmt` with [crate level use
formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Crate%5C%3A)
to re-merge all `use` statements matching Bevy's previous styling.
- Manually fixed cases where the `fmt` tool could not re-merge `use`
statements due to conditional compilation attributes.

## Testing

- Ran CI locally

## Migration Guide

The MSRV is now 1.81. Please update to this version or higher.

## Notes

- This is a _massive_ change to try and push through, which is why I've
outlined the semi-automatic steps I used to create this PR, in case this
fails and someone else tries again in the future.
- Making this change has no impact on user code, but does mean Bevy
contributors will be warned to use `core` and `alloc` instead of `std`
where possible.
- This lint is a critical first step towards investigating `no_std`
options for Bevy.

---------

Co-authored-by: François Mockers <francois.mockers@vleue.com>
2024-09-27 00:59:59 +00:00
Zachary Harrold
bc13161416
Migrated NonZero* to NonZero<*> (#14978)
# Objective

- Fixes #14974

## Solution

- Replace all* instances of `NonZero*` with `NonZero<*>`

## Testing

- CI passed locally.

---

## Notes

Within the `bevy_reflect` implementations for `std` types,
`impl_reflect_value!()` will continue to use the type aliases instead,
as it inappropriately parses the concrete type parameter as a generic
argument. If the `ZeroablePrimitive` trait was stable, or the macro
could be modified to accept a finite list of types, then we could fully
migrate.
2024-08-30 02:37:47 +00:00
Patrick Walton
4dadebd9c4
Improve performance by binning together opaque items instead of sorting them. (#12453)
Today, we sort all entities added to all phases, even the phases that
don't strictly need sorting, such as the opaque and shadow phases. This
results in a performance loss because our `PhaseItem`s are rather large
in memory, so sorting is slow. Additionally, determining the boundaries
of batches is an O(n) process.

This commit makes Bevy instead applicable place phase items into *bins*
keyed by *bin keys*, which have the invariant that everything in the
same bin is potentially batchable. This makes determining batch
boundaries O(1), because everything in the same bin can be batched.
Instead of sorting each entity, we now sort only the bin keys. This
drops the sorting time to near-zero on workloads with few bins like
`many_cubes --no-frustum-culling`. Memory usage is improved too, with
batch boundaries and dynamic indices now implicit instead of explicit.
The improved memory usage results in a significant win even on
unbatchable workloads like `many_cubes --no-frustum-culling
--vary-material-data-per-instance`, presumably due to cache effects.

Not all phases can be binned; some, such as transparent and transmissive
phases, must still be sorted. To handle this, this commit splits
`PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the
logic that today deals with `PhaseItem`s has been moved to
`SortedPhaseItem`. `BinnedPhaseItem` has the new logic.

Frame time results (in ms/frame) are as follows:

| Benchmark                | `binning` | `main`  | Speedup |
| ------------------------ | --------- | ------- | ------- |
| `many_cubes -nfc -vpi` | 232.179     | 312.123   | 34.43%  |
| `many_cubes -nfc`        | 25.874 | 30.117 | 16.40%  |
| `many_foxes`             | 3.276 | 3.515 | 7.30%   |

(`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for
`--vary-per-instance`.)

---

## Changelog

### Changed

* Render phases have been split into binned and sorted phases. Binned
phases, such as the common opaque phase, achieve improved CPU
performance by avoiding the sorting step.

## Migration Guide

- `PhaseItem` has been split into `BinnedPhaseItem` and
`SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need
to migrate them to one of these two types. `SortedPhaseItem` requires
the fewest code changes, but you may want to pick `BinnedPhaseItem` if
your phase doesn't require sorting, as that enables higher performance.

## Tracy graphs

`many-cubes --no-frustum-culling`, `main` branch:
<img width="1064" alt="Screenshot 2024-03-12 180037"
src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55">

`many-cubes --no-frustum-culling`, this branch:
<img width="1064" alt="Screenshot 2024-03-12 180011"
src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c">

You can see that `batch_and_prepare_binned_render_phase` is a much
smaller fraction of the time. Zooming in on that function, with yellow
being this branch and red being `main`, we see:

<img width="1064" alt="Screenshot 2024-03-12 175832"
src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0">

The binning happens in `queue_material_meshes`. Again with yellow being
this branch and red being `main`:
<img width="1064" alt="Screenshot 2024-03-12 175755"
src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96">

We can see that there is a small regression in `queue_material_meshes`
performance, but it's not nearly enough to outweigh the large gains in
`batch_and_prepare_binned_render_phase`.

---------

Co-authored-by: James Liu <contact@jamessliu.com>
2024-03-30 02:55:02 +00:00
Ian Kettlewell
b35974010b
Get Bevy building for WebAssembly with multithreading (#12205)
# Objective

This gets Bevy building on Wasm when the `atomics` flag is enabled. This
does not yet multithread Bevy itself, but it allows Bevy users to use a
crate like `wasm_thread` to spawn their own threads and manually
parallelize work. This is a first step towards resolving #4078 . Also
fixes #9304.

This provides a foothold so that Bevy contributors can begin to think
about multithreaded Wasm's constraints and Bevy can work towards changes
to get the engine itself multithreaded.

Some flags need to be set on the Rust compiler when compiling for Wasm
multithreading. Here's what my build script looks like, with the correct
flags set, to test out Bevy examples on web:

```bash
set -e
RUSTFLAGS='-C target-feature=+atomics,+bulk-memory,+mutable-globals' \
     cargo build --example breakout --target wasm32-unknown-unknown -Z build-std=std,panic_abort --release
 wasm-bindgen --out-name wasm_example \
   --out-dir examples/wasm/target \
   --target web target/wasm32-unknown-unknown/release/examples/breakout.wasm
 devserver --header Cross-Origin-Opener-Policy='same-origin' --header Cross-Origin-Embedder-Policy='require-corp' --path examples/wasm
```

A few notes:

1. `cpal` crashes immediately when the `atomics` flag is set. That is
patched in https://github.com/RustAudio/cpal/pull/837, but not yet in
the latest crates.io release.

That can be temporarily worked around by patching Cpal like so:
```toml
[patch.crates-io]
cpal = { git = "https://github.com/RustAudio/cpal" }
```

2. When testing out `wasm_thread` you need to enable the `es_modules`
feature.

## Solution

The largest obstacle to compiling Bevy with `atomics` on web is that
`wgpu` types are _not_ Send and Sync. Longer term Bevy will need an
approach to handle that, but in the near term Bevy is already configured
to be single-threaded on web.

Therefor it is enough to wrap `wgpu` types in a
`send_wrapper::SendWrapper` that _is_ Send / Sync, but panics if
accessed off the `wgpu` thread.

---

## Changelog

- `wgpu` types that are not `Send` are wrapped in
`send_wrapper::SendWrapper` on Wasm + 'atomics'
- CommandBuffers are not generated in parallel on Wasm + 'atomics'

## Questions
- Bevy should probably add CI checks to make sure this doesn't regress.
Should that go in this PR or a separate PR? **Edit:** Added checks to
build Wasm with atomics

---------

Co-authored-by: François <mockersf@gmail.com>
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: daxpedda <daxpedda@gmail.com>
Co-authored-by: François <francois.mockers@vleue.com>
2024-03-25 19:10:18 +00:00
James Liu
6d547d7ce6
Allow Mesh-related queue phase systems to parallelize (#11804)
# Objective
Partially addresses #3548. `queue_shadows` and `queue_material_meshes`
cannot parallelize because of the `ResMut<RenderMeshInstances>`
parameter for `queue_material_meshes`.

## Solution
Change the `material_bind_group` field to use atomics instead of needing
full mutable access. Change the `ResMut` to a `Res`, which should allow
both sets of systems to parallelize without issue.

## Performance
Tested against `many_foxes`, this has a significant improvement over the
entire render schedule. (Yellow is this PR, red is main)

![image](https://github.com/bevyengine/bevy/assets/3137680/6cc7f346-4f50-4f12-a383-682a9ce1daf6)

The use of atomics does seem to have a negative effect on
`queue_material_meshes` (roughly a 8.29% increase in time spent in the
system).

![image](https://github.com/bevyengine/bevy/assets/3137680/7907079a-863d-4760-aa5b-df68c006ea36)

`queue_shadows` seems to be ever so slightly slower (1.6% more time
spent) in the system.

![image](https://github.com/bevyengine/bevy/assets/3137680/6d90af73-b922-45e4-bae5-df200e8b9784)

`batch_and_prepare_render_phase` seems to be a mix, but overall seems to
be slightly *faster* by about 5%.

![image](https://github.com/bevyengine/bevy/assets/3137680/fac638ff-8c90-436b-9362-c6209b18957c)
2024-02-20 00:12:41 +00:00
TheBigCheese
e67cfdf82b
Enable clippy::undocumented_unsafe_blocks warning across the workspace (#10646)
# Objective

Enables warning on `clippy::undocumented_unsafe_blocks` across the
workspace rather than only in `bevy_ecs`, `bevy_transform` and
`bevy_utils`. This adds a little awkwardness in a few areas of code that
have trivial safety or explain safety for multiple unsafe blocks with
one comment however automatically prevents these comments from being
missed.

## Solution

This adds `undocumented_unsafe_blocks = "warn"` to the workspace
`Cargo.toml` and fixes / adds a few missed safety comments. I also added
`#[allow(clippy::undocumented_unsafe_blocks)]` where the safety is
explained somewhere above.

There are a couple of safety comments I added I'm not 100% sure about in
`bevy_animation` and `bevy_render/src/view` and I'm not sure about the
use of `#[allow(clippy::undocumented_unsafe_blocks)]` compared to adding
comments like `// SAFETY: See above`.
2023-11-21 02:06:24 +00:00
robtfm
8f81be9845 remove potential ub in render_resource_wrapper (#7279)
# Objective

[as noted](https://github.com/bevyengine/bevy/pull/5950#discussion_r1080762807) by james, transmuting arcs may be UB.
 
we now store a `*const ()` pointer internally, and only rely on `ptr.cast::<()>().cast::<T>() == ptr`.

as a happy side effect this removes the need for boxing the value, so todo: potentially use this for release mode as well
2023-02-06 17:14:00 +00:00
James Liu
bef9bc1844 Reduce branching in TrackedRenderPass (#7053)
# Objective
Speed up the render phase for rendering.

## Solution
 - Follow up #6988 and make the internals of atomic IDs `NonZeroU32`. This niches the `Option`s of the IDs in draw state, which reduces the size and branching behavior when evaluating for equality.
 - Require `&RenderDevice` to get the device's `Limits` when initializing a `TrackedRenderPass` to preallocate the bind groups and vertex buffer state in `DrawState`, this removes the branch on needing to resize those `Vec`s.

## Performance
This produces a similar speed up akin to that of #6885. This shows an approximate 6% speed up in `main_opaque_pass_3d` on `many_foxes` (408.79 us -> 388us). This should be orthogonal to the gains seen there.

![image](https://user-images.githubusercontent.com/3137680/209906239-e430f026-63c2-4b95-957e-a2045b810d79.png)

---

## Changelog
Added: `RenderContext::begin_tracked_render_pass`.
Changed: `TrackedRenderPass` now requires a `&RenderDevice` on construction.
Removed: `bevy_render::render_phase::DrawState`. It was not usable in any form outside of `bevy_render`.

## Migration Guide
TODO
2023-01-09 19:24:56 +00:00
Kurt Kühnert
965ebeff59 Replace UUID based IDs with a atomic-counted ones (#6988)
# Objective

- alternative to #2895 
- as mentioned in #2535 the uuid based ids in the render module should be replaced with atomic-counted ones

## Solution
- instead of generating a random UUID for each render resource, this implementation increases an atomic counter
- this might be replaced by the ids of wgpu if they expose them directly in the future

- I have not benchmarked this solution yet, but this should be slightly faster in theory.
- Bevymark does not seem to be affected much by this change, which is to be expected.

- Nothing of our API has changed, other than that the IDs have lost their IMO rather insignificant documentation.
- Maybe the documentation could be added back into the macro, but this would complicate the code.
2022-12-25 00:23:15 +00:00
robtfm
2cd0bd7575 improve compile time by type-erasing wgpu structs (#5950)
# Objective

structs containing wgpu types take a long time to compile. this is particularly bad for generics containing the wgpu structs (like the depth pipeline builder with `#[derive(SystemParam)]` i've been working on).

we can avoid that by boxing and type-erasing in the bevy `render_resource` wrappers.

type system magic is not a strength of mine so i guess there will be a cleaner way to achieve this, happy to take feedback or for it to be taken as a proof of concept if someone else wants to do a better job.

## Solution

- add macros to box and type-erase in debug mode
- leave current impl for release mode

timings:


<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">

<head>

<meta name=ProgId content=Excel.Sheet>
<meta name=Generator content="Microsoft Excel 15">
<link id=Main-File rel=Main-File
href="file:///C:/Users/robfm/AppData/Local/Temp/msohtmlclip1/01/clip.htm">
<link rel=File-List
href="file:///C:/Users/robfm/AppData/Local/Temp/msohtmlclip1/01/clip_filelist.xml">
<!--table
	{mso-displayed-decimal-separator:"\.";
	mso-displayed-thousand-separator:"\,";}
@page
	{margin:.75in .7in .75in .7in;
	mso-header-margin:.3in;
	mso-footer-margin:.3in;}
tr
	{mso-height-source:auto;}
col
	{mso-width-source:auto;}
br
	{mso-data-placement:same-cell;}
td
	{padding-top:1px;
	padding-right:1px;
	padding-left:1px;
	mso-ignore:padding;
	color:black;
	font-size:11.0pt;
	font-weight:400;
	font-style:normal;
	text-decoration:none;
	font-family:Calibri, sans-serif;
	mso-font-charset:0;
	mso-number-format:General;
	text-align:general;
	vertical-align:bottom;
	border:none;
	mso-background-source:auto;
	mso-pattern:auto;
	mso-protection:locked visible;
	white-space:nowrap;
	mso-rotate:0;}
.xl65
	{mso-number-format:0%;}
.xl66
	{vertical-align:middle;
	white-space:normal;}
.xl67
	{vertical-align:middle;}
-->
</head>

<body link="#0563C1" vlink="#954F72">



current |   |   |  
-- | -- | -- | --
  | Total time: | 64.9s |  
  | bevy_pbr v0.9.0-dev | 19.2s |  
  | bevy_render v0.9.0-dev | 17.0s |  
  | bevy_sprite v0.9.0-dev | 15.1s |  
  | DepthPipelineBuilder | 18.7s |  
  |   |   |  
with type-erasing |   |   | diff
  | Total time: | 49.0s | -24%
  | bevy_render v0.9.0-dev | 12.0s | -38%
  | bevy_pbr v0.9.0-dev | 8.7s | -49%
  | bevy_sprite v0.9.0-dev | 6.1s | -60%
  | DepthPipelineBuilder | 1.2s | -94%



</body>

</html>

the depth pipeline builder is a binary with body: 
```rust
use std::{marker::PhantomData, hash::Hash};
use bevy::{prelude::*, ecs::system::SystemParam, pbr::{RenderMaterials, MaterialPipeline, ShadowPipeline}, render::{renderer::RenderDevice, render_resource::{SpecializedMeshPipelines, PipelineCache}, render_asset::RenderAssets}};

fn main() {
    println!("Hello, world p!\n");
}

#[derive(SystemParam)]
pub struct DepthPipelineBuilder<'w, 's, M: Material> 
where M::Data: Eq + Hash + Clone,
{
    render_device: Res<'w, RenderDevice>,
    material_pipeline: Res<'w, MaterialPipeline<M>>,
    material_pipelines: ResMut<'w, SpecializedMeshPipelines<MaterialPipeline<M>>>,
    shadow_pipeline: Res<'w, ShadowPipeline>,
    pipeline_cache: ResMut<'w, PipelineCache>,
    render_meshes: Res<'w, RenderAssets<Mesh>>,
    render_materials: Res<'w, RenderMaterials<M>>,
    msaa: Res<'w, Msaa>,
    #[system_param(ignore)]
    _p: PhantomData<&'s M>,
}
```
2022-11-18 22:04:23 +00:00