forestia/bevy

Author	SHA1	Message	Date
charlotte	dd812b3e49	Type safe retained render world (#15756 ) # Objective In the Render World, there are a number of collections that are derived from Main World entities and are used to drive rendering. The most notable are: - `VisibleEntities`, which is generated in the `check_visibility` system and contains visible entities for a view. - `ExtractedInstances`, which maps entity ids to asset ids. In the old model, these collections were trivially kept in sync -- any extracted phase item could look itself up because the render entity id was guaranteed to always match the corresponding main world id. After #15320, this became much more complicated, and was leading to a number of subtle bugs in the Render World. The main rendering systems, i.e. `queue_material_meshes` and `queue_material2d_meshes`, follow a similar pattern: ```rust for visible_entity in visible_entities.iter::<With<Mesh2d>>() { let Some(mesh_instance) = render_mesh_instances.get_mut(visible_entity) else { continue; }; // Look some more stuff up and specialize the pipeline... let bin_key = Opaque2dBinKey { pipeline: pipeline_id, draw_function: draw_opaque_2d, asset_id: mesh_instance.mesh_asset_id.into(), material_bind_group_id: material_2d.get_bind_group_id().0, }; opaque_phase.add( bin_key, *visible_entity, BinnedRenderPhaseType::mesh(mesh_instance.automatic_batching), ); } ``` In this case, `visible_entities` and `render_mesh_instances` are both collections that are created and keyed by Main World entity ids, and so this lookup happens to work by coincidence. However, there is a major unintentional bug here: namely, because `visible_entities` is a collection of Main World ids, the phase item being queued is created with a Main World id rather than its correct Render World id. This happens to not break mesh rendering because the render commands used for drawing meshes do not access the `ItemQuery` parameter, but demonstrates the confusion that is now possible: our UI phase items are correctly being queued with Render World ids while our meshes aren't. Additionally, this makes it very easy and error prone to use the wrong entity id to look up things like assets. For example, if instead we ignored visibility checks and queued our meshes via a query, we'd have to be extra careful to use `&MainEntity` instead of the natural `Entity`. ## Solution Make all collections that are derived from Main World data use `MainEntity` as their key, to ensure type safety and avoid accidentally looking up data with the wrong entity id: ```rust pub type MainEntityHashMap<V> = hashbrown::HashMap<MainEntity, V, EntityHash>; ``` Additionally, we make all `PhaseItem` be able to provide both their Main and Render World ids, to allow render phase implementors maximum flexibility as to what id should be used to look up data. You can think of this like tracking at the type level whether something in the Render World should use it's "primary key", i.e. entity id, or needs to use a foreign key, i.e. `MainEntity`. ## Testing ##### TODO: This will require extensive testing to make sure things didn't break! Additionally, some extraction logic has become more complicated and needs to be checked for regressions. ## Migration Guide With the advent of the retained render world, collections that contain references to `Entity` that are extracted into the render world have been changed to contain `MainEntity` in order to prevent errors where a render world entity id is used to look up an item by accident. Custom rendering code may need to be changed to query for `&MainEntity` in order to look up the correct item from such a collection. Additionally, users who implement their own extraction logic for collections of main world entity should strongly consider extracting into a different collection that uses `MainEntity` as a key. Additionally, render phases now require specifying both the `Entity` and `MainEntity` for a given `PhaseItem`. Custom render phases should ensure `MainEntity` is available when queuing a phase item.	2024-10-10 18:47:04 +00:00
Christian Hughes	e939d6c33f	Remove remnant `EntityHash` and related types from `bevy_utils` (#15039 ) # Objective `EntityHash` and related types were moved from `bevy_utils` to `bevy_ecs` in #11498, but seemed to have been accidentally reintroduced a week later in #11707. ## Solution Remove the old leftover code. --- ## Migration Guide - Uses of `bevy::utils::{EntityHash, EntityHasher, EntityHashMap, EntityHashSet}` now have to be imported from `bevy::ecs::entity`.	2024-09-09 15:24:17 +00:00
Sam Pettersson	5f061ea008	Fix Adreno 642L crash (#14937 ) # Objective The Android example on Adreno 642L currently crashes on startup. Previous PRs #14176 and #13323 have adressed this specific crash occurring on some Adreno GPUs, that fix works as it should but isn't applied when to the GPU name contains a suffix like in the case of `642L`. ## Solution - Amending the logic to filter out any parts of the GPU name not containing digits thus enabling the fix on `642L`. ## Testing - Ran the Android example on a Nothing Phone 1. Before this change it crashed, after it works as intended. --------- Co-authored-by: Sam Pettersson <sam.pettersson@geoguessr.com>	2024-08-27 17:35:01 +00:00
re0312	032fd486c7	View filter for batch_and_prepare_render_phase (#14713 ) # Objective - batch_and_prepare_render_phase will iterate all living entities ,which potentially causes a lot of unnecessary look up - from https://github.com/bevyengine/bevy/pull/14449#issuecomment-2282876034 ## Solution - added View filter	2024-08-12 16:15:54 +00:00
James O'Brien	b98d15f278	Skip batching for phase items from other pipelines (#14296 ) # Objective - Fix #14295 ## Solution - Early out when `GFBD::get_index_and_compare_data` returns None. ## Testing - Tested on a selection of examples including `many_foxes` and `3d_shapes`. - Resolved the original issue in `bevy_vector_shapes`.	2024-08-02 00:15:42 +00:00
Patrick Walton	bc34216929	Pack multiple vertex and index arrays together into growable buffers. (#14257 ) This commit uses the [`offset-allocator`] crate to combine vertex and index arrays from different meshes into single buffers. Since the primary source of `wgpu` overhead is from validation and synchronization when switching buffers, this significantly improves Bevy's rendering performance on many scenes. This patch is a more flexible version of #13218, which also used slabs. Unlike #13218, which used slabs of a fixed size, this commit implements slabs that start small and can grow. In addition to reducing memory usage, supporting slab growth reduces the number of vertex and index buffer switches that need to happen during rendering, leading to improved performance. To prevent pathological fragmentation behavior, slabs are capped to a maximum size, and mesh arrays that are too large get their own dedicated slabs. As an additional improvement over #13218, this commit allows the application to customize all allocator heuristics. The `MeshAllocatorSettings` resource contains values that adjust the minimum and maximum slab sizes, the cutoff point at which meshes get their own dedicated slabs, and the rate at which slabs grow. Hopefully-sensible defaults have been chosen for each value. Unfortunately, WebGL 2 doesn't support the base vertex feature, which is necessary to pack vertex arrays from different meshes into the same buffer. `wgpu` represents this restriction as the downlevel flag `BASE_VERTEX`. This patch detects that bit and ensures that all vertex buffers get dedicated slabs on that platform. Even on WebGL 2, though, we can combine all index arrays into single buffers to reduce buffer changes, and we do so. The following measurements are on Bistro: Overall frame time improves from 8.74 ms to 5.53 ms (1.58x speedup): ![Screenshot 2024-07-09 163521](https://github.com/bevyengine/bevy/assets/157897/5d83c824-c0ee-434c-bbaf-218ff7212c48) Render system time improves from 6.57 ms to 3.54 ms (1.86x speedup): ![Screenshot 2024-07-09 163559](https://github.com/bevyengine/bevy/assets/157897/d94e2273-c3a0-496a-9f88-20d394129610) Opaque pass time improves from 4.64 ms to 2.33 ms (1.99x speedup): ![Screenshot 2024-07-09 163536](https://github.com/bevyengine/bevy/assets/157897/e4ef6e48-d60e-44ae-9a71-b9a731c99d9a) ## Migration Guide ### Changed * Vertex and index buffers for meshes may now be packed alongside other buffers, for performance. * `GpuMesh` has been renamed to `RenderMesh`, to reflect the fact that it no longer directly stores handles to GPU objects. * Because meshes no longer have their own vertex and index buffers, the responsibility for the buffers has moved from `GpuMesh` (now called `RenderMesh`) to the `MeshAllocator` resource. To access the vertex data for a mesh, use `MeshAllocator::mesh_vertex_slice`. To access the index data for a mesh, use `MeshAllocator::mesh_index_slice`. [`offset-allocator`]: https://github.com/pcwalton/offset-allocator	2024-07-16 20:33:15 +00:00
Litttle_fish	2d34226043	disable gpu preprocessing on android with Adreno 730 GPU and earilier (#14176 ) # Objective Fix #14146 ## Solution Expansion of #13323 , excluded Adreno 730 and earlier. ## Testing Tested on android device(Adreno 730) that used to crash	2024-07-08 01:07:03 +00:00
Patrick Walton	44db8b7fac	Allow phase items not associated with meshes to be binned. (#14029 ) As reported in #14004, many third-party plugins, such as Hanabi, enqueue entities that don't have meshes into render phases. However, the introduction of indirect mode added a dependency on mesh-specific data, breaking this workflow. This is because GPU preprocessing requires that the render phases manage indirect draw parameters, which don't apply to objects that aren't meshes. The existing code skips over binned entities that don't have indirect draw parameters, which causes the rendering to be skipped for such objects. To support this workflow, this commit adds a new field, `non_mesh_items`, to `BinnedRenderPhase`. This field contains a simple list of (bin key, entity) pairs. After drawing batchable and unbatchable objects, the non-mesh items are drawn one after another. Bevy itself doesn't enqueue any items into this list; it exists solely for the application and/or plugins to use. Additionally, this commit switches the asset ID in the standard bin keys to be an untyped asset ID rather than that of a mesh. This allows more flexibility, allowing bins to be keyed off any type of asset. This patch adds a new example, `custom_phase_item`, which simultaneously serves to demonstrate how to use this new feature and to act as a regression test so this doesn't break again. Fixes #14004. ## Changelog ### Added * `BinnedRenderPhase` now contains a `non_mesh_items` field for plugins to add custom items to.	2024-06-27 16:13:03 +00:00
François Mockers	19d078c609	don't crash without features `bevy_pbr`, `ktx2`, `zstd` (#14020 ) # Objective - Fixes #13728 ## Solution - add a new feature `smaa_luts`. if enables, it also enables `ktx2` and `zstd`. if not, it doesn't load the files but use placeholders instead - adds all the resources needed in the same places that system that uses them are added.	2024-06-26 03:08:23 +00:00
Jan Hohenheim	6273227e09	Fix lints introduced in Rust beta 1.80 (#13899 ) Resolves #13895 Mostly just involves being more explicit about which parts of the docs belong to a list and which begin a new paragraph. - found a few docs that were malformed because of exactly this, so I fixed that by introducing a paragraph - added indentation to nearly all multiline lists - fixed a few minor typos - added `#[allow(dead_code)]` to types that are needed to test annotations but are never constructed ([here](https://github.com/bevyengine/bevy/pull/13899/files#diff-b02b63604e569c8577c491e7a2030d456886d8f6716eeccd46b11df8aac75dafR1514) and [here](https://github.com/bevyengine/bevy/pull/13899/files#diff-b02b63604e569c8577c491e7a2030d456886d8f6716eeccd46b11df8aac75dafR1523)) - verified that `cargo +beta run -p ci -- lints` passes - verified that `cargo +beta run -p ci -- test` passes	2024-06-17 17:22:01 +00:00
François Mockers	e208fb70f5	disable gpu preprocessing on android with Adreno 6xx GPU (#13323 ) # Objective - Fixes #13038 ## Solution - Disable gpu preprocessing when feature `SAMPLED_TEXTURE_AND_STORAGE_BUFFER_ARRAY_NON_UNIFORM_INDEXING` is not available ## Testing - Tested on android device that used to crash	2024-05-30 14:33:27 +00:00
Patrick Walton	9da0b2a0ec	Make render phases render world resources instead of components. (#13277 ) This commit makes us stop using the render world ECS for `BinnedRenderPhase` and `SortedRenderPhase` and instead use resources with `EntityHashMap`s inside. There are three reasons to do this: 1. We can use `clear()` to clear out the render phase collections instead of recreating the components from scratch, allowing us to reuse allocations. 2. This is a prerequisite for retained bins, because components can't be retained from frame to frame in the render world, but resources can. 3. We want to move away from storing anything in components in the render world ECS, and this is a step in that direction. This patch results in a small performance benefit, due to point (1) above. ## Changelog ### Changed * The `BinnedRenderPhase` and `SortedRenderPhase` render world components have been replaced with `ViewBinnedRenderPhases` and `ViewSortedRenderPhases` resources. ## Migration Guide * The `BinnedRenderPhase` and `SortedRenderPhase` render world components have been replaced with `ViewBinnedRenderPhases` and `ViewSortedRenderPhases` resources. Instead of querying for the components, look the camera entity up in the `ViewBinnedRenderPhases`/`ViewSortedRenderPhases` tables.	2024-05-21 18:23:04 +00:00
Kristoffer Søholm	2089a28717	Add BufferVec, an higher-performance alternative to StorageBuffer, and make GpuArrayBuffer use it. (#13199 ) This is an adoption of #12670 plus some documentation fixes. See that PR for more details. --- ## Changelog * Renamed `BufferVec` to `RawBufferVec` and added a new `BufferVec` type. ## Migration Guide `BufferVec` has been renamed to `RawBufferVec` and a new similar type has taken the `BufferVec` name. --------- Co-authored-by: Patrick Walton <pcwalton@mimiga.net> Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: IceSentry <IceSentry@users.noreply.github.com>	2024-05-03 11:39:21 +00:00
Patrick Walton	f1db525f14	Don't ignore unbatchable sorted items. (#13144 ) In #12889, I mistakenly started dropping unbatchable sorted items on the floor instead of giving them solitary batches. This caused the objects in the `shader_instancing` demo to stop showing up. This patch fixes the issue by giving those items their own batches as expected. Fixes #13130.	2024-04-30 07:02:59 +00:00
miro	6c57a16b5e	Fix typo in `bevy_render/src/batching/gpu_preprocessing.rs` (#13141 ) # Objective Fix typo in `bevy_render/src/batching/gpu_preprocessing.rs` https://github.com/bevyengine/bevy/issues/13135	2024-04-29 20:30:15 +00:00
Patrick Walton	16531fb3e3	Implement GPU frustum culling. (#12889 ) This commit implements opt-in GPU frustum culling, built on top of the infrastructure in https://github.com/bevyengine/bevy/pull/12773. To enable it on a camera, add the `GpuCulling` component to it. To additionally disable CPU frustum culling, add the `NoCpuCulling` component. Note that adding `GpuCulling` without `NoCpuCulling` currently does nothing useful. The reason why `GpuCulling` doesn't automatically imply `NoCpuCulling` is that I intend to follow this patch up with GPU two-phase occlusion culling, and CPU frustum culling plus GPU occlusion culling seems like a very commonly-desired mode. Adding the `GpuCulling` component to a view puts that view into indirect mode. This mode makes all drawcalls indirect, relying on the mesh preprocessing shader to allocate instances dynamically. In indirect mode, the `PreprocessWorkItem` `output_index` points not to a `MeshUniform` instance slot but instead to a set of `wgpu` `IndirectParameters`, from which it allocates an instance slot dynamically if frustum culling succeeds. Batch building has been updated to allocate and track indirect parameter slots, and the AABBs are now supplied to the GPU as `MeshCullingData`. A small amount of code relating to the frustum culling has been borrowed from meshlets and moved into `maths.wgsl`. Note that standard Bevy frustum culling uses AABBs, while meshlets use bounding spheres; this means that not as much code can be shared as one might think. This patch doesn't provide any way to perform GPU culling on shadow maps, to avoid making this patch bigger than it already is. That can be a followup. ## Changelog ### Added * Frustum culling can now optionally be done on the GPU. To enable it, add the `GpuCulling` component to a camera. * To disable CPU frustum culling, add `NoCpuCulling` to a camera. Note that `GpuCulling` doesn't automatically imply `NoCpuCulling`.	2024-04-28 12:50:00 +00:00
Robert Swain	5f05e75a70	Fix 2D BatchedInstanceBuffer clear (#12922 ) # Objective - `cargo run --release --example bevymark -- --benchmark --waves 160 --per-wave 1000 --mode mesh2d` runs slower and slower over time due to `no_gpu_preprocessing::write_batched_instance_buffer<bevy_sprite::mesh2d::mesh::Mesh2dPipeline>` taking longer and longer because the `BatchedInstanceBuffer` is not cleared ## Solution - Split the `clear_batched_instance_buffers` system into CPU and GPU versions - Use the CPU version for 2D meshes	2024-04-15 05:00:43 +00:00
Patrick Walton	11817f4ba4	Generate `MeshUniform`s on the GPU via compute shader where available. (#12773 ) Currently, `MeshUniform`s are rather large: 160 bytes. They're also somewhat expensive to compute, because they involve taking the inverse of a 3x4 matrix. Finally, if a mesh is present in multiple views, that mesh will have a separate `MeshUniform` for each and every view, which is wasteful. This commit fixes these issues by introducing the concept of a mesh input uniform and adding a mesh uniform building compute shader pass. The `MeshInputUniform` is simply the minimum amount of data needed for the GPU to compute the full `MeshUniform`. Most of this data is just the transform and is therefore only 64 bytes. `MeshInputUniform`s are computed during the extraction phase, much like skins are today, in order to avoid needlessly copying transforms around on CPU. (In fact, the render app has been changed to only store the translation of each mesh; it no longer cares about any other part of the transform, which is stored only on the GPU and the main world.) Before rendering, the `build_mesh_uniforms` pass runs to expand the `MeshInputUniform`s to the full `MeshUniform`. The mesh uniform building pass does the following, all on GPU: 1. Copy the appropriate fields of the `MeshInputUniform` to the `MeshUniform` slot. If a single mesh is present in multiple views, this effectively duplicates it into each view. 2. Compute the inverse transpose of the model transform, used for transforming normals. 3. If applicable, copy the mesh's transform from the previous frame for TAA. To support this, we double-buffer the `MeshInputUniform`s over two frames and swap the buffers each frame. The `MeshInputUniform`s for the current frame contain the index of that mesh's `MeshInputUniform` for the previous frame. This commit produces wins in virtually every CPU part of the pipeline: `extract_meshes`, `queue_material_meshes`, `batch_and_prepare_render_phase`, and especially `write_batched_instance_buffer` are all faster. Shrinking the amount of CPU data that has to be shuffled around speeds up the entire rendering process. \| Benchmark \| This branch \| `main` \| Speedup \| \|------------------------\|-------------\|---------\|---------\| \| `many_cubes -nfc` \| 17.259 \| 24.529 \| 42.12% \| \| `many_cubes -nfc -vpi` \| 302.116 \| 312.123 \| 3.31% \| \| `many_foxes` \| 3.227 \| 3.515 \| 8.92% \| Because mesh uniform building requires compute shader, and WebGL 2 has no compute shader, the existing CPU mesh uniform building code has been left as-is. Many types now have both CPU mesh uniform building and GPU mesh uniform building modes. Developers can opt into the old CPU mesh uniform building by setting the `use_gpu_uniform_builder` option on `PbrPlugin` to `false`. Below are graphs of the CPU portions of `many-cubes --no-frustum-culling`. Yellow is this branch, red is `main`. `extract_meshes`: ![Screenshot 2024-04-02 124842](https://github.com/bevyengine/bevy/assets/157897/a6748ea4-dd05-47b6-9254-45d07d33cb10) It's notable that we get a small win even though we're now writing to a GPU buffer. `queue_material_meshes`: ![Screenshot 2024-04-02 124911](https://github.com/bevyengine/bevy/assets/157897/ecb44d78-65dc-448d-ba85-2de91aa2ad94) There's a bit of a regression here; not sure what's causing it. In any case it's very outweighed by the other gains. `batch_and_prepare_render_phase`: ![Screenshot 2024-04-02 125123](https://github.com/bevyengine/bevy/assets/157897/4e20fc86-f9dd-4e5c-8623-837e4258f435) There's a huge win here, enough to make batching basically drop off the profile. `write_batched_instance_buffer`: ![Screenshot 2024-04-02 125237](https://github.com/bevyengine/bevy/assets/157897/401a5c32-9dc1-4991-996d-eb1cac6014b2) There's a massive improvement here, as expected. Note that a lot of it simply comes from the fact that `MeshInputUniform` is `Pod`. (This isn't a maintainability problem in my view because `MeshInputUniform` is so simple: just 16 tightly-packed words.) ## Changelog ### Added * Per-mesh instance data is now generated on GPU with a compute shader instead of CPU, resulting in rendering performance improvements on platforms where compute shaders are supported. ## Migration guide * Custom render phases now need multiple systems beyond just `batch_and_prepare_render_phase`. Code that was previously creating custom render phases should now add a `BinnedRenderPhasePlugin` or `SortedRenderPhasePlugin` as appropriate instead of directly adding `batch_and_prepare_render_phase`.	2024-04-10 05:33:32 +00:00

18 Commits