2158f3d91f
7 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
![]() |
bc34216929
|
Pack multiple vertex and index arrays together into growable buffers. (#14257)
This commit uses the [`offset-allocator`] crate to combine vertex and index arrays from different meshes into single buffers. Since the primary source of `wgpu` overhead is from validation and synchronization when switching buffers, this significantly improves Bevy's rendering performance on many scenes. This patch is a more flexible version of #13218, which also used slabs. Unlike #13218, which used slabs of a fixed size, this commit implements slabs that start small and can grow. In addition to reducing memory usage, supporting slab growth reduces the number of vertex and index buffer switches that need to happen during rendering, leading to improved performance. To prevent pathological fragmentation behavior, slabs are capped to a maximum size, and mesh arrays that are too large get their own dedicated slabs. As an additional improvement over #13218, this commit allows the application to customize all allocator heuristics. The `MeshAllocatorSettings` resource contains values that adjust the minimum and maximum slab sizes, the cutoff point at which meshes get their own dedicated slabs, and the rate at which slabs grow. Hopefully-sensible defaults have been chosen for each value. Unfortunately, WebGL 2 doesn't support the *base vertex* feature, which is necessary to pack vertex arrays from different meshes into the same buffer. `wgpu` represents this restriction as the downlevel flag `BASE_VERTEX`. This patch detects that bit and ensures that all vertex buffers get dedicated slabs on that platform. Even on WebGL 2, though, we can combine all *index* arrays into single buffers to reduce buffer changes, and we do so. The following measurements are on Bistro: Overall frame time improves from 8.74 ms to 5.53 ms (1.58x speedup):  Render system time improves from 6.57 ms to 3.54 ms (1.86x speedup):  Opaque pass time improves from 4.64 ms to 2.33 ms (1.99x speedup):  ## Migration Guide ### Changed * Vertex and index buffers for meshes may now be packed alongside other buffers, for performance. * `GpuMesh` has been renamed to `RenderMesh`, to reflect the fact that it no longer directly stores handles to GPU objects. * Because meshes no longer have their own vertex and index buffers, the responsibility for the buffers has moved from `GpuMesh` (now called `RenderMesh`) to the `MeshAllocator` resource. To access the vertex data for a mesh, use `MeshAllocator::mesh_vertex_slice`. To access the index data for a mesh, use `MeshAllocator::mesh_index_slice`. [`offset-allocator`]: https://github.com/pcwalton/offset-allocator |
||
![]() |
11817f4ba4
|
Generate MeshUniform s on the GPU via compute shader where available. (#12773)
Currently, `MeshUniform`s are rather large: 160 bytes. They're also somewhat expensive to compute, because they involve taking the inverse of a 3x4 matrix. Finally, if a mesh is present in multiple views, that mesh will have a separate `MeshUniform` for each and every view, which is wasteful. This commit fixes these issues by introducing the concept of a *mesh input uniform* and adding a *mesh uniform building* compute shader pass. The `MeshInputUniform` is simply the minimum amount of data needed for the GPU to compute the full `MeshUniform`. Most of this data is just the transform and is therefore only 64 bytes. `MeshInputUniform`s are computed during the *extraction* phase, much like skins are today, in order to avoid needlessly copying transforms around on CPU. (In fact, the render app has been changed to only store the translation of each mesh; it no longer cares about any other part of the transform, which is stored only on the GPU and the main world.) Before rendering, the `build_mesh_uniforms` pass runs to expand the `MeshInputUniform`s to the full `MeshUniform`. The mesh uniform building pass does the following, all on GPU: 1. Copy the appropriate fields of the `MeshInputUniform` to the `MeshUniform` slot. If a single mesh is present in multiple views, this effectively duplicates it into each view. 2. Compute the inverse transpose of the model transform, used for transforming normals. 3. If applicable, copy the mesh's transform from the previous frame for TAA. To support this, we double-buffer the `MeshInputUniform`s over two frames and swap the buffers each frame. The `MeshInputUniform`s for the current frame contain the index of that mesh's `MeshInputUniform` for the previous frame. This commit produces wins in virtually every CPU part of the pipeline: `extract_meshes`, `queue_material_meshes`, `batch_and_prepare_render_phase`, and especially `write_batched_instance_buffer` are all faster. Shrinking the amount of CPU data that has to be shuffled around speeds up the entire rendering process. | Benchmark | This branch | `main` | Speedup | |------------------------|-------------|---------|---------| | `many_cubes -nfc` | 17.259 | 24.529 | 42.12% | | `many_cubes -nfc -vpi` | 302.116 | 312.123 | 3.31% | | `many_foxes` | 3.227 | 3.515 | 8.92% | Because mesh uniform building requires compute shader, and WebGL 2 has no compute shader, the existing CPU mesh uniform building code has been left as-is. Many types now have both CPU mesh uniform building and GPU mesh uniform building modes. Developers can opt into the old CPU mesh uniform building by setting the `use_gpu_uniform_builder` option on `PbrPlugin` to `false`. Below are graphs of the CPU portions of `many-cubes --no-frustum-culling`. Yellow is this branch, red is `main`. `extract_meshes`:  It's notable that we get a small win even though we're now writing to a GPU buffer. `queue_material_meshes`:  There's a bit of a regression here; not sure what's causing it. In any case it's very outweighed by the other gains. `batch_and_prepare_render_phase`:  There's a huge win here, enough to make batching basically drop off the profile. `write_batched_instance_buffer`:  There's a massive improvement here, as expected. Note that a lot of it simply comes from the fact that `MeshInputUniform` is `Pod`. (This isn't a maintainability problem in my view because `MeshInputUniform` is so simple: just 16 tightly-packed words.) ## Changelog ### Added * Per-mesh instance data is now generated on GPU with a compute shader instead of CPU, resulting in rendering performance improvements on platforms where compute shaders are supported. ## Migration guide * Custom render phases now need multiple systems beyond just `batch_and_prepare_render_phase`. Code that was previously creating custom render phases should now add a `BinnedRenderPhasePlugin` or `SortedRenderPhasePlugin` as appropriate instead of directly adding `batch_and_prepare_render_phase`. |
||
![]() |
ab7cbfa8fc
|
Consolidate Render(Ui)Materials(2d) into RenderAssets (#12827)
# Objective - Replace `RenderMaterials` / `RenderMaterials2d` / `RenderUiMaterials` with `RenderAssets` to enable implementing changes to one thing, `RenderAssets`, that applies to all use cases rather than duplicating changes everywhere for multiple things that should be one thing. - Adopts #8149 ## Solution - Make RenderAsset generic over the destination type rather than the source type as in #8149 - Use `RenderAssets<PreparedMaterial<M>>` etc for render materials --- ## Changelog - Changed: - The `RenderAsset` trait is now implemented on the destination type. Its `SourceAsset` associated type refers to the type of the source asset. - `RenderMaterials`, `RenderMaterials2d`, and `RenderUiMaterials` have been replaced by `RenderAssets<PreparedMaterial<M>>` and similar. ## Migration Guide - `RenderAsset` is now implemented for the destination type rather that the source asset type. The source asset type is now the `RenderAsset` trait's `SourceAsset` associated type. |
||
![]() |
01649f13e2
|
Refactor App and SubApp internals for better separation (#9202)
# Objective This is a necessary precursor to #9122 (this was split from that PR to reduce the amount of code to review all at once). Moving `!Send` resource ownership to `App` will make it unambiguously `!Send`. `SubApp` must be `Send`, so it can't wrap `App`. ## Solution Refactor `App` and `SubApp` to not have a recursive relationship. Since `SubApp` no longer wraps `App`, once `!Send` resources are moved out of `World` and into `App`, `SubApp` will become unambiguously `Send`. There could be less code duplication between `App` and `SubApp`, but that would break `App` method chaining. ## Changelog - `SubApp` no longer wraps `App`. - `App` fields are no longer publicly accessible. - `App` can no longer be converted into a `SubApp`. - Various methods now return references to a `SubApp` instead of an `App`. ## Migration Guide - To construct a sub-app, use `SubApp::new()`. `App` can no longer convert into `SubApp`. - If you implemented a trait for `App`, you may want to implement it for `SubApp` as well. - If you're accessing `app.world` directly, you now have to use `app.world()` and `app.world_mut()`. - `App::sub_app` now returns `&SubApp`. - `App::sub_app_mut` now returns `&mut SubApp`. - `App::get_sub_app` now returns `Option<&SubApp>.` - `App::get_sub_app_mut` now returns `Option<&mut SubApp>.` |
||
![]() |
f9cc91d5a1
|
Intern mesh vertex buffer layouts so that we don't have to compare them over and over. (#12216)
Although we cached hashes of `MeshVertexBufferLayout`, we were paying the cost of `PartialEq` on `InnerMeshVertexBufferLayout` for every entity, every frame. This patch changes that logic to place `MeshVertexBufferLayout`s in `Arc`s so that they can be compared and hashed by pointer. This results in a 28% speedup in the `queue_material_meshes` phase of `many_cubes`, with frustum culling disabled. Additionally, this patch contains two minor changes: 1. This commit flattens the specialized mesh pipeline cache to one level of hash tables instead of two. This saves a hash lookup. 2. The example `many_cubes` has been given a `--no-frustum-culling` flag, to aid in benchmarking. See the Tracy profile: <img width="1064" alt="Screenshot 2024-02-29 144406" src="https://github.com/bevyengine/bevy/assets/157897/18632f1d-1fdd-4ac7-90ed-2d10306b2a1e"> ## Migration guide * Duplicate `MeshVertexBufferLayout`s are now combined into a single object, `MeshVertexBufferLayoutRef`, which contains an atomically-reference-counted pointer to the layout. Code that was using `MeshVertexBufferLayout` may need to be updated to use `MeshVertexBufferLayoutRef` instead. |
||
![]() |
1c67e020f7
|
Move EntityHash related types into bevy_ecs (#11498)
# Objective Reduce the size of `bevy_utils` (https://github.com/bevyengine/bevy/issues/11478) ## Solution Move `EntityHash` related types into `bevy_ecs`. This also allows us access to `Entity`, which means we no longer need `EntityHashMap`'s first generic argument. --- ## Changelog - Moved `bevy::utils::{EntityHash, EntityHasher, EntityHashMap, EntityHashSet}` into `bevy::ecs::entity::hash` . - Removed `EntityHashMap`'s first generic argument. It is now hardcoded to always be `Entity`. ## Migration Guide - Uses of `bevy::utils::{EntityHash, EntityHasher, EntityHashMap, EntityHashSet}` now have to be imported from `bevy::ecs::entity::hash`. - Uses of `EntityHashMap` no longer have to specify the first generic parameter. It is now hardcoded to always be `Entity`. |
||
![]() |
dd14f3a477
|
Implement lightmaps. (#10231)
 # Objective Lightmaps, textures that store baked global illumination, have been a mainstay of real-time graphics for decades. Bevy currently has no support for them, so this pull request implements them. ## Solution The new `Lightmap` component can be attached to any entity that contains a `Handle<Mesh>` and a `StandardMaterial`. When present, it will be applied in the PBR shader. Because multiple lightmaps are frequently packed into atlases, each lightmap may have its own UV boundaries within its texture. An `exposure` field is also provided, to control the brightness of the lightmap. Note that this PR doesn't provide any way to bake the lightmaps. That can be done with [The Lightmapper] or another solution, such as Unity's Bakery. --- ## Changelog ### Added * A new component, `Lightmap`, is available, for baked global illumination. If your mesh has a second UV channel (UV1), and you attach this component to the entity with that mesh, Bevy will apply the texture referenced in the lightmap. [The Lightmapper]: https://github.com/Naxela/The_Lightmapper --------- Co-authored-by: Carter Anderson <mcanders1@gmail.com> |