Commit Graph

5 Commits

Author SHA1 Message Date
vero
7eadc1d467
Zero Copy Mesh (#15569)
# Objective

- Another step towards #15558

## Solution

- Instead of allocating a Vec and then having wgpu copy it into a
staging buffer, write directly into the staging buffer.
- gets rid of another hidden copy, in `pad_to_alignment`.

future work:
- why is there a gcd implementation in here (and its subpar, use
binary_gcd. its in the hot path, run twice for every mesh, every frame i
think?) make it better and put it in bevy_math
- zero-copy custom mesh api to avoid having to write out a Mesh from a
custom rep

## Testing

- lighting and many_cubes run fine (and slightly faster. havent
benchmarked though)

---

## Showcase

- look ma... no copies

at least when RenderAssetUsage is GPU only :3

---------

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: Kristoffer Søholm <k.soeholm@gmail.com>
2024-10-04 21:24:44 +00:00
vero
6465e3bd9f
Fix Mesh allocator bug and reduce Mesh data copies by two (#15566)
# Objective

- First step towards #15558

## Solution

- Rename `get_vertex_buffer_data` to `create_packed_vertex_buffer_data`
to make it clear that it is not "free" and actually allocates
- Compute length analytically for preallocation instead of creating the
buffer to get its length and immediately discard it
- Use existing vertex attribute size calculation method to reduce code
duplication
- Fix a bug where mesh index data was being replaced by unnecessarily
newly created mesh vertex data in some cases
- Overall reduces mesh copies by two. We still have plenty to go, but
these were the easy ones.

## Testing

- I ran 3d_scene, lighting, and many_cubes, they look fine.
- Benchmarks would be nice, but this is very obviously a win in perf and
correctness.

---

## Migration Guide

- `Mesh::create_packed_vertex_buffer_data` has been renamed
`Mesh::create_packed_vertex_buffer_data` to reflect the fact that it
copies data and allocates.

## Showcase

- look mom, less copies
2024-10-01 17:15:57 +00:00
Zachary Harrold
d70595b667
Add core and alloc over std Lints (#15281)
# Objective

- Fixes #6370
- Closes #6581

## Solution

- Added the following lints to the workspace:
  - `std_instead_of_core`
  - `std_instead_of_alloc`
  - `alloc_instead_of_core`
- Used `cargo +nightly fmt` with [item level use
formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Item%5C%3A)
to split all `use` statements into single items.
- Used `cargo clippy --workspace --all-targets --all-features --fix
--allow-dirty` to _attempt_ to resolve the new linting issues, and
intervened where the lint was unable to resolve the issue automatically
(usually due to needing an `extern crate alloc;` statement in a crate
root).
- Manually removed certain uses of `std` where negative feature gating
prevented `--all-features` from finding the offending uses.
- Used `cargo +nightly fmt` with [crate level use
formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Crate%5C%3A)
to re-merge all `use` statements matching Bevy's previous styling.
- Manually fixed cases where the `fmt` tool could not re-merge `use`
statements due to conditional compilation attributes.

## Testing

- Ran CI locally

## Migration Guide

The MSRV is now 1.81. Please update to this version or higher.

## Notes

- This is a _massive_ change to try and push through, which is why I've
outlined the semi-automatic steps I used to create this PR, in case this
fails and someone else tries again in the future.
- Making this change has no impact on user code, but does mean Bevy
contributors will be warned to use `core` and `alloc` instead of `std`
where possible.
- This lint is a critical first step towards investigating `no_std`
options for Bevy.

---------

Co-authored-by: François Mockers <francois.mockers@vleue.com>
2024-09-27 00:59:59 +00:00
Eero Lehtinen
db525e660e
Fix MeshAllocator panic (#14560)
# Objective

 Fixes #14540

## Solution

- Clean slab layouts from stale `SlabId`s when freeing meshes
- Technically performance requirements of freeing now increase based on
the number of existing meshes, but maybe it doesn't matter too much in
practice
- This was the case before this PR too, but it's technically possible to
free and allocate 2^32 times and overflow with `SlabId`s and cause
incorrect behavior. It looks like new meshes would then override old
ones.

## Testing

- Tested in `loading_screen` example and tapping keyboard 1 and 2.
2024-09-16 22:54:01 +00:00
Patrick Walton
bc34216929
Pack multiple vertex and index arrays together into growable buffers. (#14257)
This commit uses the [`offset-allocator`] crate to combine vertex and
index arrays from different meshes into single buffers. Since the
primary source of `wgpu` overhead is from validation and synchronization
when switching buffers, this significantly improves Bevy's rendering
performance on many scenes.

This patch is a more flexible version of #13218, which also used slabs.
Unlike #13218, which used slabs of a fixed size, this commit implements
slabs that start small and can grow. In addition to reducing memory
usage, supporting slab growth reduces the number of vertex and index
buffer switches that need to happen during rendering, leading to
improved performance. To prevent pathological fragmentation behavior,
slabs are capped to a maximum size, and mesh arrays that are too large
get their own dedicated slabs.

As an additional improvement over #13218, this commit allows the
application to customize all allocator heuristics. The
`MeshAllocatorSettings` resource contains values that adjust the minimum
and maximum slab sizes, the cutoff point at which meshes get their own
dedicated slabs, and the rate at which slabs grow. Hopefully-sensible
defaults have been chosen for each value.

Unfortunately, WebGL 2 doesn't support the *base vertex* feature, which
is necessary to pack vertex arrays from different meshes into the same
buffer. `wgpu` represents this restriction as the downlevel flag
`BASE_VERTEX`. This patch detects that bit and ensures that all vertex
buffers get dedicated slabs on that platform. Even on WebGL 2, though,
we can combine all *index* arrays into single buffers to reduce buffer
changes, and we do so.

The following measurements are on Bistro:

Overall frame time improves from 8.74 ms to 5.53 ms (1.58x speedup):
![Screenshot 2024-07-09
163521](https://github.com/bevyengine/bevy/assets/157897/5d83c824-c0ee-434c-bbaf-218ff7212c48)

Render system time improves from 6.57 ms to 3.54 ms (1.86x speedup):
![Screenshot 2024-07-09
163559](https://github.com/bevyengine/bevy/assets/157897/d94e2273-c3a0-496a-9f88-20d394129610)

Opaque pass time improves from 4.64 ms to 2.33 ms (1.99x speedup):
![Screenshot 2024-07-09
163536](https://github.com/bevyengine/bevy/assets/157897/e4ef6e48-d60e-44ae-9a71-b9a731c99d9a)

## Migration Guide

### Changed

* Vertex and index buffers for meshes may now be packed alongside other
buffers, for performance.
* `GpuMesh` has been renamed to `RenderMesh`, to reflect the fact that
it no longer directly stores handles to GPU objects.
* Because meshes no longer have their own vertex and index buffers, the
responsibility for the buffers has moved from `GpuMesh` (now called
`RenderMesh`) to the `MeshAllocator` resource. To access the vertex data
for a mesh, use `MeshAllocator::mesh_vertex_slice`. To access the index
data for a mesh, use `MeshAllocator::mesh_index_slice`.

[`offset-allocator`]: https://github.com/pcwalton/offset-allocator
2024-07-16 20:33:15 +00:00