forestia/bevy

Author	SHA1	Message	Date
Patrick Walton	9da0b2a0ec	Make render phases render world resources instead of components. (#13277 ) This commit makes us stop using the render world ECS for `BinnedRenderPhase` and `SortedRenderPhase` and instead use resources with `EntityHashMap`s inside. There are three reasons to do this: 1. We can use `clear()` to clear out the render phase collections instead of recreating the components from scratch, allowing us to reuse allocations. 2. This is a prerequisite for retained bins, because components can't be retained from frame to frame in the render world, but resources can. 3. We want to move away from storing anything in components in the render world ECS, and this is a step in that direction. This patch results in a small performance benefit, due to point (1) above. ## Changelog ### Changed * The `BinnedRenderPhase` and `SortedRenderPhase` render world components have been replaced with `ViewBinnedRenderPhases` and `ViewSortedRenderPhases` resources. ## Migration Guide * The `BinnedRenderPhase` and `SortedRenderPhase` render world components have been replaced with `ViewBinnedRenderPhases` and `ViewSortedRenderPhases` resources. Instead of querying for the components, look the camera entity up in the `ViewBinnedRenderPhases`/`ViewSortedRenderPhases` tables.	2024-05-21 18:23:04 +00:00
Csányi István	f91fd322b7	Skip redundant mesh_position_local_to_world call in vertex prepass shader (#13158 ) # Objective Optimize vertex prepass shader maybe? Make it consistent with the base vertex shader ## Solution `mesh_position_local_to_clip` just calls `mesh_position_local_to_world` and then `position_world_to_clip` since `out.world_position` is getting calculated anyway a few lines below, just move it up and use it's output to calculate `out.position`. It is the same as in the base vertex shader (`mesh.wgsl`). Note: I have no idea if there is a reason that it was this way. I'm not an expert, just noticed this inconsistency while messing with custom shaders.	2024-05-14 16:31:58 +00:00
Johannes Hackel	3f5a090b1b	Add UV channel selection to StandardMaterial (#13200 ) # Objective - The StandardMaterial always uses ATTRIBUTE_UV_0 for each texture except lightmap. This is not flexible enough for a lot of gltf Files. - Fixes #12496 - Fixes #13086 - Fixes #13122 - Closes #13153 ## Solution - The StandardMaterial gets extended for each texture by an UvChannel enum. It defaults to Uv0 but can also be set to Uv1. - The gltf loader now handles the texcoord information. If the texcoord is not supported it creates a warning. - It uses StandardMaterial shader defs to define which attribute to use. ## Testing This fixes #12496 for example: ![wall_fixed](https://github.com/bevyengine/bevy/assets/688816/bc37c9e1-72ba-4e59-b092-5ee10dade603) For testing of all kind of textures I used the TextureTransformMultiTest from https://github.com/KhronosGroup/glTF-Sample-Assets/tree/main/Models/TextureTransformMultiTest Its purpose is to test multiple texture transfroms but it is also a good test for different texcoords. It also shows the issue with emission #13133. Before: ![TextureTransformMultiTest_main](https://github.com/bevyengine/bevy/assets/688816/aa701d04-5a3f-4df1-a65f-fc770ab6f4ab) After: ![TextureTransformMultiTest_texcoord](https://github.com/bevyengine/bevy/assets/688816/c3f91943-b830-4068-990f-e4f2c97771ee)	2024-05-13 18:23:09 +00:00
JMS55	e1a0da0fa6	Meshlet LOD-compatible two-pass occlusion culling (#12898 ) Keeping track of explicit visibility per cluster between frames does not work with LODs, and leads to worse culling (using the final depth buffer from the previous frame is more accurate). Instead, we need to generate a second depth pyramid after the second raster pass, and then use that in the first culling pass in the next frame to test if a cluster would have been visible last frame or not. As part of these changes, the write_index_buffer pass has been folded into the culling pass for a large performance gain, and to avoid tracking a lot of extra state that would be needed between passes. Prepass previous model/view stuff was adapted to work with meshlets as well. Also fixed a bug with materials, and other misc improvements. --------- Co-authored-by: François <mockersf@gmail.com> Co-authored-by: atlas dostal <rodol@rivalrebels.com> Co-authored-by: vero <email@atlasdostal.com> Co-authored-by: Patrick Walton <pcwalton@mimiga.net> Co-authored-by: Robert Swain <robert.swain@gmail.com>	2024-04-28 05:30:20 +00:00
JMS55	17633c1f75	Remove unused push constants (#13076 ) The shader code was removed in #11280, but we never cleaned up the rust code.	2024-04-23 21:43:46 +00:00
Patrick Walton	1141e731ff	Implement alpha to coverage (A2C) support. (#12970 ) [Alpha to coverage] (A2C) replaces alpha blending with a hardware-specific multisample coverage mask when multisample antialiasing is in use. It's a simple form of [order-independent transparency] that relies on MSAA. ["Anti-aliased Alpha Test: The Esoteric Alpha To Coverage"] is a good summary of the motivation for and best practices relating to A2C. This commit implements alpha to coverage support as a new variant for `AlphaMode`. You can supply `AlphaMode::AlphaToCoverage` as the `alpha_mode` field in `StandardMaterial` to use it. When in use, the standard material shader automatically applies the texture filtering method from ["Anti-aliased Alpha Test: The Esoteric Alpha To Coverage"]. Objects with alpha-to-coverage materials are binned in the opaque pass, as they're fully order-independent. The `transparency_3d` example has been updated to feature an object with alpha to coverage. Happily, the example was already using MSAA. This is part of #2223, as far as I can tell. [Alpha to coverage]: https://en.wikipedia.org/wiki/Alpha_to_coverage [order-independent transparency]: https://en.wikipedia.org/wiki/Order-independent_transparency ["Anti-aliased Alpha Test: The Esoteric Alpha To Coverage"]: https://bgolus.medium.com/anti-aliased-alpha-test-the-esoteric-alpha-to-coverage-8b177335ae4f --- ## Changelog ### Added * The `AlphaMode` enum now supports `AlphaToCoverage`, to provide limited order-independent transparency when multisample antialiasing is in use.	2024-04-15 20:37:52 +00:00
Patrick Walton	5caf085dac	Divide the single `VisibleEntities` list into separate lists for 2D meshes, 3D meshes, lights, and UI elements, for performance. (#12582 ) This commit splits `VisibleEntities::entities` into four separate lists: one for lights, one for 2D meshes, one for 3D meshes, and one for UI elements. This allows `queue_material_meshes` and similar methods to avoid examining entities that are obviously irrelevant. In particular, this separation helps scenes with many skinned meshes, as the individual bones are considered visible entities but have no rendered appearance. Internally, `VisibleEntities::entities` is a `HashMap` from the `TypeId` representing a `QueryFilter` to the appropriate `Entity` list. I had to do this because `VisibleEntities` is located within an upstream crate from the crates that provide lights (`bevy_pbr`) and 2D meshes (`bevy_sprite`). As an added benefit, this setup allows apps to provide their own types of renderable components, by simply adding a specialized `check_visibility` to the schedule. This provides a 16.23% end-to-end speedup on `many_foxes` with 10,000 foxes (24.06 ms/frame to 20.70 ms/frame). ## Migration guide * `check_visibility` and `VisibleEntities` now store the four types of renderable entities--2D meshes, 3D meshes, lights, and UI elements--separately. If your custom rendering code examines `VisibleEntities`, it will now need to specify which type of entity it's interested in using the `WithMesh2d`, `WithMesh`, `WithLight`, and `WithNode` types respectively. If your app introduces a new type of renderable entity, you'll need to add an explicit call to `check_visibility` to the schedule to accommodate your new component or components. ## Analysis `many_foxes`, 10,000 foxes: `main`: ![Screenshot 2024-03-31 114444](https://github.com/bevyengine/bevy/assets/157897/16ecb2ff-6e04-46c0-a4b0-b2fde2084bad) `many_foxes`, 10,000 foxes, this branch: ![Screenshot 2024-03-31 114256](https://github.com/bevyengine/bevy/assets/157897/94dedae4-bd00-45b2-9aaf-dfc237004ddb) `queue_material_meshes` (yellow = this branch, red = `main`): ![Screenshot 2024-03-31 114637](https://github.com/bevyengine/bevy/assets/157897/f90912bd-45bd-42c4-bd74-57d98a0f036e) `queue_shadows` (yellow = this branch, red = `main`): ![Screenshot 2024-03-31 114607](https://github.com/bevyengine/bevy/assets/157897/6ce693e3-20c0-4234-8ec9-a6f191299e2d)	2024-04-11 20:33:20 +00:00
Patrick Walton	11817f4ba4	Generate `MeshUniform`s on the GPU via compute shader where available. (#12773 ) Currently, `MeshUniform`s are rather large: 160 bytes. They're also somewhat expensive to compute, because they involve taking the inverse of a 3x4 matrix. Finally, if a mesh is present in multiple views, that mesh will have a separate `MeshUniform` for each and every view, which is wasteful. This commit fixes these issues by introducing the concept of a mesh input uniform and adding a mesh uniform building compute shader pass. The `MeshInputUniform` is simply the minimum amount of data needed for the GPU to compute the full `MeshUniform`. Most of this data is just the transform and is therefore only 64 bytes. `MeshInputUniform`s are computed during the extraction phase, much like skins are today, in order to avoid needlessly copying transforms around on CPU. (In fact, the render app has been changed to only store the translation of each mesh; it no longer cares about any other part of the transform, which is stored only on the GPU and the main world.) Before rendering, the `build_mesh_uniforms` pass runs to expand the `MeshInputUniform`s to the full `MeshUniform`. The mesh uniform building pass does the following, all on GPU: 1. Copy the appropriate fields of the `MeshInputUniform` to the `MeshUniform` slot. If a single mesh is present in multiple views, this effectively duplicates it into each view. 2. Compute the inverse transpose of the model transform, used for transforming normals. 3. If applicable, copy the mesh's transform from the previous frame for TAA. To support this, we double-buffer the `MeshInputUniform`s over two frames and swap the buffers each frame. The `MeshInputUniform`s for the current frame contain the index of that mesh's `MeshInputUniform` for the previous frame. This commit produces wins in virtually every CPU part of the pipeline: `extract_meshes`, `queue_material_meshes`, `batch_and_prepare_render_phase`, and especially `write_batched_instance_buffer` are all faster. Shrinking the amount of CPU data that has to be shuffled around speeds up the entire rendering process. \| Benchmark \| This branch \| `main` \| Speedup \| \|------------------------\|-------------\|---------\|---------\| \| `many_cubes -nfc` \| 17.259 \| 24.529 \| 42.12% \| \| `many_cubes -nfc -vpi` \| 302.116 \| 312.123 \| 3.31% \| \| `many_foxes` \| 3.227 \| 3.515 \| 8.92% \| Because mesh uniform building requires compute shader, and WebGL 2 has no compute shader, the existing CPU mesh uniform building code has been left as-is. Many types now have both CPU mesh uniform building and GPU mesh uniform building modes. Developers can opt into the old CPU mesh uniform building by setting the `use_gpu_uniform_builder` option on `PbrPlugin` to `false`. Below are graphs of the CPU portions of `many-cubes --no-frustum-culling`. Yellow is this branch, red is `main`. `extract_meshes`: ![Screenshot 2024-04-02 124842](https://github.com/bevyengine/bevy/assets/157897/a6748ea4-dd05-47b6-9254-45d07d33cb10) It's notable that we get a small win even though we're now writing to a GPU buffer. `queue_material_meshes`: ![Screenshot 2024-04-02 124911](https://github.com/bevyengine/bevy/assets/157897/ecb44d78-65dc-448d-ba85-2de91aa2ad94) There's a bit of a regression here; not sure what's causing it. In any case it's very outweighed by the other gains. `batch_and_prepare_render_phase`: ![Screenshot 2024-04-02 125123](https://github.com/bevyengine/bevy/assets/157897/4e20fc86-f9dd-4e5c-8623-837e4258f435) There's a huge win here, enough to make batching basically drop off the profile. `write_batched_instance_buffer`: ![Screenshot 2024-04-02 125237](https://github.com/bevyengine/bevy/assets/157897/401a5c32-9dc1-4991-996d-eb1cac6014b2) There's a massive improvement here, as expected. Note that a lot of it simply comes from the fact that `MeshInputUniform` is `Pod`. (This isn't a maintainability problem in my view because `MeshInputUniform` is so simple: just 16 tightly-packed words.) ## Changelog ### Added * Per-mesh instance data is now generated on GPU with a compute shader instead of CPU, resulting in rendering performance improvements on platforms where compute shaders are supported. ## Migration guide * Custom render phases now need multiple systems beyond just `batch_and_prepare_render_phase`. Code that was previously creating custom render phases should now add a `BinnedRenderPhasePlugin` or `SortedRenderPhasePlugin` as appropriate instead of directly adding `batch_and_prepare_render_phase`.	2024-04-10 05:33:32 +00:00
Robert Swain	ab7cbfa8fc	Consolidate Render(Ui)Materials(2d) into RenderAssets (#12827 ) # Objective - Replace `RenderMaterials` / `RenderMaterials2d` / `RenderUiMaterials` with `RenderAssets` to enable implementing changes to one thing, `RenderAssets`, that applies to all use cases rather than duplicating changes everywhere for multiple things that should be one thing. - Adopts #8149 ## Solution - Make RenderAsset generic over the destination type rather than the source type as in #8149 - Use `RenderAssets<PreparedMaterial<M>>` etc for render materials --- ## Changelog - Changed: - The `RenderAsset` trait is now implemented on the destination type. Its `SourceAsset` associated type refers to the type of the source asset. - `RenderMaterials`, `RenderMaterials2d`, and `RenderUiMaterials` have been replaced by `RenderAssets<PreparedMaterial<M>>` and similar. ## Migration Guide - `RenderAsset` is now implemented for the destination type rather that the source asset type. The source asset type is now the `RenderAsset` trait's `SourceAsset` associated type.	2024-04-09 13:26:34 +00:00
JMS55	31b5943ad4	Add previous_view_uniforms.inverse_view (#12902 ) # Objective - Upload previous frame's inverse_view matrix to the GPU for use with https://github.com/bevyengine/bevy/pull/12898. --- ## Changelog - Added `prepass_bindings::previous_view_uniforms.inverse_view`. - Renamed `prepass_bindings::previous_view_proj` to `prepass_bindings::previous_view_uniforms.view_proj`. - Renamed `PreviousViewProjectionUniformOffset` to `PreviousViewUniformOffset`. - Renamed `PreviousViewProjection` to `PreviousViewData`. ## Migration Guide - Renamed `prepass_bindings::previous_view_proj` to `prepass_bindings::previous_view_uniforms.view_proj`. - Renamed `PreviousViewProjectionUniformOffset` to `PreviousViewUniformOffset`. - Renamed `PreviousViewProjection` to `PreviousViewData`.	2024-04-07 18:59:16 +00:00
Patrick Walton	37522fd0ae	Micro-optimize `queue_material_meshes`, primarily to remove bit manipulation. (#12791 ) This commit makes the following optimizations: ## `MeshPipelineKey`/`BaseMeshPipelineKey` split `MeshPipelineKey` has been split into `BaseMeshPipelineKey`, which lives in `bevy_render` and `MeshPipelineKey`, which lives in `bevy_pbr`. Conceptually, `BaseMeshPipelineKey` is a superclass of `MeshPipelineKey`. For `BaseMeshPipelineKey`, the bits start at the highest (most significant) bit and grow downward toward the lowest bit; for `MeshPipelineKey`, the bits start at the lowest bit and grow upward toward the highest bit. This prevents them from colliding. The goal of this is to avoid having to reassemble bits of the pipeline key for every mesh every frame. Instead, we can just use a bitwise or operation to combine the pieces that make up a `MeshPipelineKey`. ## `specialize_slow` Previously, all of `specialize()` was marked as `#[inline]`. This bloated `queue_material_meshes` unnecessarily, as a large chunk of it ended up being a slow path that was rarely hit. This commit refactors the function to move the slow path to `specialize_slow()`. Together, these two changes shave about 5% off `queue_material_meshes`: ![Screenshot 2024-03-29 130002](https://github.com/bevyengine/bevy/assets/157897/a7e5a994-a807-4328-b314-9003429dcdd2) ## Migration Guide - The `primitive_topology` field on `GpuMesh` is now an accessor method: `GpuMesh::primitive_topology()`. - For performance reasons, `MeshPipelineKey` has been split into `BaseMeshPipelineKey`, which lives in `bevy_render`, and `MeshPipelineKey`, which lives in `bevy_pbr`. These two should be combined with bitwise-or to produce the final `MeshPipelineKey`.	2024-04-01 21:58:53 +00:00
Cameron	01649f13e2	Refactor `App` and `SubApp` internals for better separation (#9202 ) # Objective This is a necessary precursor to #9122 (this was split from that PR to reduce the amount of code to review all at once). Moving `!Send` resource ownership to `App` will make it unambiguously `!Send`. `SubApp` must be `Send`, so it can't wrap `App`. ## Solution Refactor `App` and `SubApp` to not have a recursive relationship. Since `SubApp` no longer wraps `App`, once `!Send` resources are moved out of `World` and into `App`, `SubApp` will become unambiguously `Send`. There could be less code duplication between `App` and `SubApp`, but that would break `App` method chaining. ## Changelog - `SubApp` no longer wraps `App`. - `App` fields are no longer publicly accessible. - `App` can no longer be converted into a `SubApp`. - Various methods now return references to a `SubApp` instead of an `App`. ## Migration Guide - To construct a sub-app, use `SubApp::new()`. `App` can no longer convert into `SubApp`. - If you implemented a trait for `App`, you may want to implement it for `SubApp` as well. - If you're accessing `app.world` directly, you now have to use `app.world()` and `app.world_mut()`. - `App::sub_app` now returns `&SubApp`. - `App::sub_app_mut` now returns `&mut SubApp`. - `App::get_sub_app` now returns `Option<&SubApp>.` - `App::get_sub_app_mut` now returns `Option<&mut SubApp>.`	2024-03-31 03:16:10 +00:00
Patrick Walton	4dadebd9c4	Improve performance by binning together opaque items instead of sorting them. (#12453 ) Today, we sort all entities added to all phases, even the phases that don't strictly need sorting, such as the opaque and shadow phases. This results in a performance loss because our `PhaseItem`s are rather large in memory, so sorting is slow. Additionally, determining the boundaries of batches is an O(n) process. This commit makes Bevy instead applicable place phase items into bins keyed by bin keys, which have the invariant that everything in the same bin is potentially batchable. This makes determining batch boundaries O(1), because everything in the same bin can be batched. Instead of sorting each entity, we now sort only the bin keys. This drops the sorting time to near-zero on workloads with few bins like `many_cubes --no-frustum-culling`. Memory usage is improved too, with batch boundaries and dynamic indices now implicit instead of explicit. The improved memory usage results in a significant win even on unbatchable workloads like `many_cubes --no-frustum-culling --vary-material-data-per-instance`, presumably due to cache effects. Not all phases can be binned; some, such as transparent and transmissive phases, must still be sorted. To handle this, this commit splits `PhaseItem` into `BinnedPhaseItem` and `SortedPhaseItem`. Most of the logic that today deals with `PhaseItem`s has been moved to `SortedPhaseItem`. `BinnedPhaseItem` has the new logic. Frame time results (in ms/frame) are as follows: \| Benchmark \| `binning` \| `main` \| Speedup \| \| ------------------------ \| --------- \| ------- \| ------- \| \| `many_cubes -nfc -vpi` \| 232.179 \| 312.123 \| 34.43% \| \| `many_cubes -nfc` \| 25.874 \| 30.117 \| 16.40% \| \| `many_foxes` \| 3.276 \| 3.515 \| 7.30% \| (`-nfc` is short for `--no-frustum-culling`; `-vpi` is short for `--vary-per-instance`.) --- ## Changelog ### Changed * Render phases have been split into binned and sorted phases. Binned phases, such as the common opaque phase, achieve improved CPU performance by avoiding the sorting step. ## Migration Guide - `PhaseItem` has been split into `BinnedPhaseItem` and `SortedPhaseItem`. If your code has custom `PhaseItem`s, you will need to migrate them to one of these two types. `SortedPhaseItem` requires the fewest code changes, but you may want to pick `BinnedPhaseItem` if your phase doesn't require sorting, as that enables higher performance. ## Tracy graphs `many-cubes --no-frustum-culling`, `main` branch: <img width="1064" alt="Screenshot 2024-03-12 180037" src="https://github.com/bevyengine/bevy/assets/157897/e1180ce8-8e89-46d2-85e3-f59f72109a55"> `many-cubes --no-frustum-culling`, this branch: <img width="1064" alt="Screenshot 2024-03-12 180011" src="https://github.com/bevyengine/bevy/assets/157897/0899f036-6075-44c5-a972-44d95895f46c"> You can see that `batch_and_prepare_binned_render_phase` is a much smaller fraction of the time. Zooming in on that function, with yellow being this branch and red being `main`, we see: <img width="1064" alt="Screenshot 2024-03-12 175832" src="https://github.com/bevyengine/bevy/assets/157897/0dfc8d3f-49f4-496e-8825-a66e64d356d0"> The binning happens in `queue_material_meshes`. Again with yellow being this branch and red being `main`: <img width="1064" alt="Screenshot 2024-03-12 175755" src="https://github.com/bevyengine/bevy/assets/157897/b9b20dc1-11c8-400c-a6cc-1c2e09c1bb96"> We can see that there is a small regression in `queue_material_meshes` performance, but it's not nearly enough to outweigh the large gains in `batch_and_prepare_binned_render_phase`. --------- Co-authored-by: James Liu <contact@jamessliu.com>	2024-03-30 02:55:02 +00:00
JMS55	4f20faaa43	Meshlet rendering (initial feature) (#10164 ) # Objective - Implements a more efficient, GPU-driven (https://github.com/bevyengine/bevy/issues/1342) rendering pipeline based on meshlets. - Meshes are split into small clusters of triangles called meshlets, each of which acts as a mini index buffer into the larger mesh data. Meshlets can be compressed, streamed, culled, and batched much more efficiently than monolithic meshes. ![image](https://github.com/bevyengine/bevy/assets/47158642/cb2aaad0-7a9a-4e14-93b0-15d4e895b26a) ![image](https://github.com/bevyengine/bevy/assets/47158642/7534035b-1eb7-4278-9b99-5322e4401715) # Misc * Future work: https://github.com/bevyengine/bevy/issues/11518 * Nanite reference: https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf Two pass occlusion culling explained very well: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501 --------- Co-authored-by: Ricky Taylor <rickytaylor26@gmail.com> Co-authored-by: vero <email@atlasdostal.com> Co-authored-by: François <mockersf@gmail.com> Co-authored-by: atlas dostal <rodol@rivalrebels.com>	2024-03-25 19:08:27 +00:00
Gino Valente	4c47e31be6	bevy_reflect: Remove `U32Visitor` (#12433 ) # Objective The `U32Visitor` struct has been unused since its introduction in #6140. It's made itself known now by causing a recent [CI failure](https://github.com/bevyengine/bevy/actions/runs/8243333274/job/22543736624). ## Solution Remove the unused `U32Visitor` struct. Also removed `PrepassLightsViewFlush` as it was causing a [similar CI failure](https://github.com/bevyengine/bevy/actions/runs/8243838066/job/22545103746?pr=12433#step:6:269) on this PR.	2024-03-12 06:19:29 +00:00
Patrick Walton	f9cc91d5a1	Intern mesh vertex buffer layouts so that we don't have to compare them over and over. (#12216 ) Although we cached hashes of `MeshVertexBufferLayout`, we were paying the cost of `PartialEq` on `InnerMeshVertexBufferLayout` for every entity, every frame. This patch changes that logic to place `MeshVertexBufferLayout`s in `Arc`s so that they can be compared and hashed by pointer. This results in a 28% speedup in the `queue_material_meshes` phase of `many_cubes`, with frustum culling disabled. Additionally, this patch contains two minor changes: 1. This commit flattens the specialized mesh pipeline cache to one level of hash tables instead of two. This saves a hash lookup. 2. The example `many_cubes` has been given a `--no-frustum-culling` flag, to aid in benchmarking. See the Tracy profile: <img width="1064" alt="Screenshot 2024-02-29 144406" src="https://github.com/bevyengine/bevy/assets/157897/18632f1d-1fdd-4ac7-90ed-2d10306b2a1e"> ## Migration guide * Duplicate `MeshVertexBufferLayout`s are now combined into a single object, `MeshVertexBufferLayoutRef`, which contains an atomically-reference-counted pointer to the layout. Code that was using `MeshVertexBufferLayout` may need to be updated to use `MeshVertexBufferLayoutRef` instead.	2024-03-01 20:56:21 +00:00
Elabajaba	78b6fa1f1b	sort alpha masked pipelines by pipeline & mesh instead of by distance (#12117 ) # Objective - followup to https://github.com/bevyengine/bevy/pull/11671 - I forgot to change the alpha masked phases. ## Solution - Change the sorting for alpha mask phases to sort by pipeline+mesh instead of distance, for much better batching for alpha masked materials. I also fixed some docs that I missed in the previous PR. --- ## Changelog - Alpha masked materials are now sorted by pipeline and mesh.	2024-02-26 11:14:59 +00:00
eri	5f8f3b532c	Check `cfg` during CI and fix feature typos (#12103 ) # Objective - Add the new `-Zcheck-cfg` checks to catch more warnings - Fixes #12091 ## Solution - Create a new `cfg-check` to the CI that runs `cargo check -Zcheck-cfg --workspace` using cargo nightly (and fails if there are warnings) - Fix all warnings generated by the new check --- ## Changelog - Remove all redundant imports - Fix cfg wasm32 targets - Add 3 dead code exceptions (should StandardColor be unused?) - Convert ios_simulator to a feature (I'm not sure if this is the right way to do it, but the check complained before) ## Migration Guide No breaking changes --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>	2024-02-25 15:19:27 +00:00
Patrick Walton	3af8526786	Stop extracting mesh entities to the render world. (#11803 ) This fixes a `FIXME` in `extract_meshes` and results in a performance improvement. As a result of this change, meshes in the render world might not be attached to entities anymore. Therefore, the `entity` parameter to `RenderCommand::render()` is now wrapped in an `Option`. Most applications that use the render app's ECS can simply unwrap the `Option`. Note that for now sprites, gizmos, and UI elements still use the render world as usual. ## Migration guide * For efficiency reasons, some meshes in the render world may not have corresponding `Entity` IDs anymore. As a result, the `entity` parameter to `RenderCommand::render()` is now wrapped in an `Option`. Custom rendering code may need to be updated to handle the case in which no `Entity` exists for an object that is to be rendered.	2024-02-10 10:46:10 +00:00
Elabajaba	2a1ebc4ac4	sort by pipeline then mesh for non transparent passes for massively better batching (#11671 ) # Objective Bevy does ridiculous amount of drawcalls, and our batching isn't very effective because we sort by distance and only batch if we get multiple of the same object in a row. This can give us slightly better GPU performance when not using the depth prepass (due to less overdraw), but ends up being massively CPU bottlenecked due to doing thousands of unnecessary drawcalls. ## Solution Change the sort functions to sort by pipeline key then by mesh id for large performance gains in more realistic scenes than our stress tests. Pipelines changed: - Opaque3d - Opaque3dDeferred - Opaque3dPrepass ![image](https://github.com/bevyengine/bevy/assets/177631/8c355256-ad86-4b47-81a0-f3906797fe7e) --- ## Changelog - Opaque3d drawing order is now sorted by pipeline and mesh, rather than by distance. This trades off a bit of GPU time in exchange for massively better batching in scenes that aren't only drawing huge amounts of a single object.	2024-02-05 22:12:22 +00:00
Elabajaba	35ac1b152e	Update to wgpu 0.19 and raw-window-handle 0.6 (#11280 ) # Objective Keep core dependencies up to date. ## Solution Update the dependencies. wgpu 0.19 only supports raw-window-handle (rwh) 0.6, so bumping that was included in this. The rwh 0.6 version bump is just the simplest way of doing it. There might be a way we can take advantage of wgpu's new safe surface creation api, but I'm not familiar enough with bevy's window management to untangle it and my attempt ended up being a mess of lifetimes and rustc complaining about missing trait impls (that were implemented). Thanks to @MiniaczQ for the (much simpler) rwh 0.6 version bump code. Unblocks https://github.com/bevyengine/bevy/pull/9172 and https://github.com/bevyengine/bevy/pull/10812 ~~This might be blocked on cpal and oboe updating their ndk versions to 0.8, as they both currently target ndk 0.7 which uses rwh 0.5.2~~ Tested on android, and everything seems to work correctly (audio properly stops when minimized, and plays when re-focusing the app). --- ## Changelog - `wgpu` has been updated to 0.19! The long awaited arcanization has been merged (for more info, see https://gfx-rs.github.io/2023/11/24/arcanization.html), and Vulkan should now be working again on Intel GPUs. - Targeting WebGPU now requires that you add the new `webgpu` feature (setting the `RUSTFLAGS` environment variable to `--cfg=web_sys_unstable_apis` is still required). This feature currently overrides the `webgl2` feature if you have both enabled (the `webgl2` feature is enabled by default), so it is not recommended to add it as a default feature to libraries without putting it behind a flag that allows library users to opt out of it! In the future we plan on supporting wasm binaries that can target both webgl2 and webgpu now that wgpu added support for doing so (see https://github.com/bevyengine/bevy/issues/11505). - `raw-window-handle` has been updated to version 0.6. ## Migration Guide - `bevy_render::instance_index::get_instance_index()` has been removed as the webgl2 workaround is no longer required as it was fixed upstream in wgpu. The `BASE_INSTANCE_WORKAROUND` shaderdef has also been removed. - WebGPU now requires the new `webgpu` feature to be enabled. The `webgpu` feature currently overrides the `webgl2` feature so you no longer need to disable all default features and re-add them all when targeting `webgpu`, but binaries built with both the `webgpu` and `webgl2` features will only target the webgpu backend, and will only work on browsers that support WebGPU. - Places where you conditionally compiled things for webgl2 need to be updated because of this change, eg: - `#[cfg(any(not(feature = "webgl"), not(target_arch = "wasm32")))]` becomes `#[cfg(any(not(feature = "webgl") ,not(target_arch = "wasm32"), feature = "webgpu"))]` - `#[cfg(all(feature = "webgl", target_arch = "wasm32"))]` becomes `#[cfg(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))]` - `if cfg!(all(feature = "webgl", target_arch = "wasm32"))` becomes `if cfg!(all(feature = "webgl", target_arch = "wasm32", not(feature = "webgpu")))` - `create_texture_with_data` now also takes a `TextureDataOrder`. You can probably just set this to `TextureDataOrder::default()` - `TextureFormat`'s `block_size` has been renamed to `block_copy_size` - See the `wgpu` changelog for anything I might've missed: https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md --------- Co-authored-by: François <mockersf@gmail.com>	2024-01-26 18:14:21 +00:00
JMS55	a796d53a05	Meshlet prep (#11442 ) # Objective - Prep for https://github.com/bevyengine/bevy/pull/10164 - Make deferred_lighting_pass_id a ColorAttachment - Correctly extract shadow view frusta so that the view uniforms get populated - Make some needed things public - Misc formatting	2024-01-22 15:28:33 +00:00
Alice Cecile	eb07d16871	Revert rendering-related associated type name changes (#11027 ) # Objective > Can anyone explain to me the reasoning of renaming all the types named Query to Data. I'm talking about this PR https://github.com/bevyengine/bevy/pull/10779 It doesn't make sense to me that a bunch of types that are used to run queries aren't named Query anymore. Like ViewQuery on the ViewNode is the type of the Query. I don't really understand the point of the rename, it just seems like it hides the fact that a query will run based on those types. [@IceSentry](https://discord.com/channels/691052431525675048/692572690833473578/1184946251431694387) ## Solution Revert several renames in #10779. ## Changelog - `ViewNode::ViewData` is now `ViewNode::ViewQuery` again. ## Migration Guide - This PR amends the migration guide in https://github.com/bevyengine/bevy/pull/10779 --------- Co-authored-by: atlas dostal <rodol@rivalrebels.com>	2024-01-22 15:01:55 +00:00
Jakob Hellermann	a657478675	resolve all internal ambiguities (#10411 ) - ignore all ambiguities that are not a problem - remove `.before(Assets::<Image>::track_assets),` that points into a different schedule (-> should this be caught?) - add some explicit orderings: - run `poll_receivers` and `update_accessibility_nodes` after `window_closed` in `bevy_winit::accessibility` - run `bevy_ui::accessibility::calc_bounds` after `CameraUpdateSystem` - run ` bevy_text::update_text2d_layout` and `bevy_ui::text_system` after `font_atlas_set::remove_dropped_font_atlas_sets` - add `app.ignore_ambiguity(a, b)` function for cases where you want to ignore an ambiguity between two independent plugins `A` and `B` - add `IgnoreAmbiguitiesPlugin` in `DefaultPlugins` that allows cross-crate ambiguities like `bevy_animation`/`bevy_ui` - Fixes https://github.com/bevyengine/bevy/issues/9511 ## Before Render ![render_schedule_Render dot](https://github.com/bevyengine/bevy/assets/22177966/1c677968-7873-40cc-848c-91fca4c8e383) PostUpdate ![schedule_PostUpdate dot](https://github.com/bevyengine/bevy/assets/22177966/8fc61304-08d4-4533-8110-c04113a7367a) ## After Render ![render_schedule_Render dot](https://github.com/bevyengine/bevy/assets/22177966/462f3b28-cef7-4833-8619-1f5175983485) PostUpdate ![schedule_PostUpdate dot](https://github.com/bevyengine/bevy/assets/22177966/8cfb3d83-7842-4a84-9082-46177e1a6c70) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: Alice Cecile <alice.i.cecil@gmail.com> Co-authored-by: François <mockersf@gmail.com>	2024-01-09 19:08:15 +00:00
Patrick Walton	5697fee3ad	Bump the vertex attribute index for prepass joints. (#11191 ) This was missed in #10231. Fixes #11190.	2024-01-03 10:35:39 +00:00
Patrick Walton	dd14f3a477	Implement lightmaps. (#10231 ) ![Screenshot](https://i.imgur.com/A4KzWFq.png) # Objective Lightmaps, textures that store baked global illumination, have been a mainstay of real-time graphics for decades. Bevy currently has no support for them, so this pull request implements them. ## Solution The new `Lightmap` component can be attached to any entity that contains a `Handle<Mesh>` and a `StandardMaterial`. When present, it will be applied in the PBR shader. Because multiple lightmaps are frequently packed into atlases, each lightmap may have its own UV boundaries within its texture. An `exposure` field is also provided, to control the brightness of the lightmap. Note that this PR doesn't provide any way to bake the lightmaps. That can be done with [The Lightmapper] or another solution, such as Unity's Bakery. --- ## Changelog ### Added * A new component, `Lightmap`, is available, for baked global illumination. If your mesh has a second UV channel (UV1), and you attach this component to the entity with that mesh, Bevy will apply the texture referenced in the lightmap. [The Lightmapper]: https://github.com/Naxela/The_Lightmapper --------- Co-authored-by: Carter Anderson <mcanders1@gmail.com>	2024-01-02 20:38:47 +00:00
JMS55	70b0eacc3b	Keep track of when a texture is first cleared (#10325 ) # Objective - Custom render passes, or future passes in the engine (such as https://github.com/bevyengine/bevy/pull/10164) need a better way to know and indicate to the core passes whether the view color/depth/prepass attachments have been cleared or not yet this frame, to know if they should clear it themselves or load it. ## Solution - For all render targets (depth textures, shadow textures, prepass textures, main textures) use an atomic bool to track whether or not each texture has been cleared this frame. Abstracted away in the new ColorAttachment and DepthAttachment wrappers. --- ## Changelog - Changed `ViewTarget::get_color_attachment()`, removed arguments. - Changed `ViewTarget::get_unsampled_color_attachment()`, removed arguments. - Removed `Camera3d::clear_color`. - Removed `Camera2d::clear_color`. - Added `Camera::clear_color`. - Added `ExtractedCamera::clear_color`. - Added `ColorAttachment` and `DepthAttachment` wrappers. - Moved `ClearColor` and `ClearColorConfig` from `bevy::core_pipeline::clear_color` to `bevy::render::camera`. - Core render passes now track when a texture is first bound as an attachment in order to decide whether to clear or load it. ## Migration Guide - Remove arguments to `ViewTarget::get_color_attachment()` and `ViewTarget::get_unsampled_color_attachment()`. - Configure clear color on `Camera` instead of on `Camera3d` and `Camera2d`. - Moved `ClearColor` and `ClearColorConfig` from `bevy::core_pipeline::clear_color` to `bevy::render::camera`. - `ViewDepthTexture` must now be created via the `new()` method --------- Co-authored-by: vero <email@atlasdostal.com> Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>	2023-12-31 00:37:37 +00:00
Mantas	5af2f022d8	Rename `WorldQueryData` & `WorldQueryFilter` to `QueryData` & `QueryFilter` (#10779 ) # Rename `WorldQueryData` & `WorldQueryFilter` to `QueryData` & `QueryFilter` Fixes #10776 ## Solution Traits `WorldQueryData` & `WorldQueryFilter` were renamed to `QueryData` and `QueryFilter`, respectively. Related Trait types were also renamed. --- ## Changelog - Trait `WorldQueryData` has been renamed to `QueryData`. Derive macro's `QueryData` attribute `world_query_data` has been renamed to `query_data`. - Trait `WorldQueryFilter` has been renamed to `QueryFilter`. Derive macro's `QueryFilter` attribute `world_query_filter` has been renamed to `query_filter`. - Trait's `ExtractComponent` type `Query` has been renamed to `Data`. - Trait's `GetBatchData` types `Query` & `QueryFilter` has been renamed to `Data` & `Filter`, respectively. - Trait's `ExtractInstance` type `Query` has been renamed to `Data`. - Trait's `ViewNode` type `ViewQuery` has been renamed to `ViewData`. - Trait's `RenderCommand` types `ViewWorldQuery` & `ItemWorldQuery` has been renamed to `ViewData` & `ItemData`, respectively. ## Migration Guide Note: if merged before 0.13 is released, this should instead modify the migration guide of #10776 with the updated names. - Rename `WorldQueryData` & `WorldQueryFilter` trait usages to `QueryData` & `QueryFilter` and their respective derive macro attributes `world_query_data` & `world_query_filter` to `query_data` & `query_filter`. - Rename the following trait type usages: - Trait's `ExtractComponent` type `Query` to `Data`. - Trait's `GetBatchData` type `Query` to `Data`. - Trait's `ExtractInstance` type `Query` to `Data`. - Trait's `ViewNode` type `ViewQuery` to `ViewData`' - Trait's `RenderCommand` types `ViewWolrdQuery` & `ItemWorldQuery` to `ViewData` & `ItemData`, respectively. ```rust // Before #[derive(WorldQueryData)] #[world_query_data(derive(Debug))] struct EmptyQuery { empty: (), } // After #[derive(QueryData)] #[query_data(derive(Debug))] struct EmptyQuery { empty: (), } // Before #[derive(WorldQueryFilter)] struct CustomQueryFilter<T: Component, P: Component> { _c: With<ComponentC>, _d: With<ComponentD>, _or: Or<(Added<ComponentC>, Changed<ComponentD>, Without<ComponentZ>)>, _generic_tuple: (With<T>, With<P>), } // After #[derive(QueryFilter)] struct CustomQueryFilter<T: Component, P: Component> { _c: With<ComponentC>, _d: With<ComponentD>, _or: Or<(Added<ComponentC>, Changed<ComponentD>, Without<ComponentZ>)>, _generic_tuple: (With<T>, With<P>), } // Before impl ExtractComponent for ContrastAdaptiveSharpeningSettings { type Query = &'static Self; type Filter = With<Camera>; type Out = (DenoiseCAS, CASUniform); fn extract_component(item: QueryItem<Self::Query>) -> Option<Self::Out> { //... } } // After impl ExtractComponent for ContrastAdaptiveSharpeningSettings { type Data = &'static Self; type Filter = With<Camera>; type Out = (DenoiseCAS, CASUniform); fn extract_component(item: QueryItem<Self::Data>) -> Option<Self::Out> { //... } } // Before impl GetBatchData for MeshPipeline { type Param = SRes<RenderMeshInstances>; type Query = Entity; type QueryFilter = With<Mesh3d>; type CompareData = (MaterialBindGroupId, AssetId<Mesh>); type BufferData = MeshUniform; fn get_batch_data( mesh_instances: &SystemParamItem<Self::Param>, entity: &QueryItem<Self::Query>, ) -> (Self::BufferData, Option<Self::CompareData>) { // .... } } // After impl GetBatchData for MeshPipeline { type Param = SRes<RenderMeshInstances>; type Data = Entity; type Filter = With<Mesh3d>; type CompareData = (MaterialBindGroupId, AssetId<Mesh>); type BufferData = MeshUniform; fn get_batch_data( mesh_instances: &SystemParamItem<Self::Param>, entity: &QueryItem<Self::Data>, ) -> (Self::BufferData, Option<Self::CompareData>) { // .... } } // Before impl<A> ExtractInstance for AssetId<A> where A: Asset, { type Query = Read<Handle<A>>; type Filter = (); fn extract(item: QueryItem<'_, Self::Query>) -> Option<Self> { Some(item.id()) } } // After impl<A> ExtractInstance for AssetId<A> where A: Asset, { type Data = Read<Handle<A>>; type Filter = (); fn extract(item: QueryItem<'_, Self::Data>) -> Option<Self> { Some(item.id()) } } // Before impl ViewNode for PostProcessNode { type ViewQuery = ( &'static ViewTarget, &'static PostProcessSettings, ); fn run( &self, _graph: &mut RenderGraphContext, render_context: &mut RenderContext, (view_target, _post_process_settings): QueryItem<Self::ViewQuery>, world: &World, ) -> Result<(), NodeRunError> { // ... } } // After impl ViewNode for PostProcessNode { type ViewData = ( &'static ViewTarget, &'static PostProcessSettings, ); fn run( &self, _graph: &mut RenderGraphContext, render_context: &mut RenderContext, (view_target, _post_process_settings): QueryItem<Self::ViewData>, world: &World, ) -> Result<(), NodeRunError> { // ... } } // Before impl<P: CachedRenderPipelinePhaseItem> RenderCommand<P> for SetItemPipeline { type Param = SRes<PipelineCache>; type ViewWorldQuery = (); type ItemWorldQuery = (); #[inline] fn render<'w>( item: &P, _view: (), _entity: (), pipeline_cache: SystemParamItem<'w, '_, Self::Param>, pass: &mut TrackedRenderPass<'w>, ) -> RenderCommandResult { // ... } } // After impl<P: CachedRenderPipelinePhaseItem> RenderCommand<P> for SetItemPipeline { type Param = SRes<PipelineCache>; type ViewData = (); type ItemData = (); #[inline] fn render<'w>( item: &P, _view: (), _entity: (), pipeline_cache: SystemParamItem<'w, '_, Self::Param>, pass: &mut TrackedRenderPass<'w>, ) -> RenderCommandResult { // ... } } ```	2023-12-12 19:45:50 +00:00
Elabajaba	0f5d8128c9	Fix prepass binding issues causing crashes when not all prepass bindings are used (#10788 ) # Objective Fixes https://github.com/bevyengine/bevy/issues/10786 ## Solution The bind_group_layout entries for the prepass were wrong when not all 4 prepass textures were used, as it just zipped [17, 18, 19, 20] with the smallvec of prepass `bind_group_layout` entries that potentially didn't contain 4 entries. (eg. if you had a depth and motion vector prepass but no normal prepass, then depth would be correct but the entry for the motion vector prepass would be 18 (normal prepass' spot) instead of 19). Change the prepass `get_bind_group_layout_entries` function to return an array of `[Option<BindGroupLayoutEntryBuilder>; 4]` and only add the layout entry if it exists.	2023-11-29 23:11:12 +00:00
tygyh	fd308571c4	Remove unnecessary path prefixes (#10749 ) # Objective - Shorten paths by removing unnecessary prefixes ## Solution - Remove the prefixes from many paths which do not need them. Finding the paths was done automatically using built-in refactoring tools in Jetbrains RustRover.	2023-11-28 23:43:40 +00:00
JMS55	4bf20e7d27	Swap material and mesh bind groups (#10485 ) # Objective - Materials should be a more frequent rebind then meshes (due to being able to use a single vertex buffer, such as in #10164) and therefore should be in a higher bind group. --- ## Changelog - For 2d and 3d mesh/material setups (but not UI materials, or other rendering setups such as gizmos, sprites, or text), mesh data is now in bind group 1, and material data is now in bind group 2, which is swapped from how they were before. ## Migration Guide - Custom 2d and 3d mesh/material shaders should now use bind group 2 `@group(2) @binding(x)` for their bound resources, instead of bind group 1. - Many internal pieces of rendering code have changed so that mesh data is now in bind group 1, and material data is now in bind group 2. Semi-custom rendering setups (that don't use the Material or Material2d APIs) should adapt to these changes.	2023-11-28 22:26:22 +00:00
robtfm	d95d20f4b1	prepass vertex shader always outputs world position (#10657 ) # Objective the pbr prepass vertex shader currently only sets `VertexOutput::world_position` when deferred or motion prepasses are enabled. the field is always in the vertex output so is otherwise undetermined, and the calculation is very cheap. ## Solution always set the world position in the pbr prepass vert shader.	2023-11-28 12:13:28 +00:00
IceSentry	6d0c11a28f	Bind group layout entries (#10224 ) # Objective - Follow up to #9694 ## Solution - Same api as #9694 but adapted for `BindGroupLayoutEntry` - Use the same `ShaderStages` visibilty for all entries by default - Add `BindingType` helper function that mirror the wgsl equivalent and that make writing layouts much simpler. Before: ```rust let layout = render_device.create_bind_group_layout(&BindGroupLayoutDescriptor { label: Some("post_process_bind_group_layout"), entries: &[ BindGroupLayoutEntry { binding: 0, visibility: ShaderStages::FRAGMENT, ty: BindingType::Texture { sample_type: TextureSampleType::Float { filterable: true }, view_dimension: TextureViewDimension::D2, multisampled: false, }, count: None, }, BindGroupLayoutEntry { binding: 1, visibility: ShaderStages::FRAGMENT, ty: BindingType::Sampler(SamplerBindingType::Filtering), count: None, }, BindGroupLayoutEntry { binding: 2, visibility: ShaderStages::FRAGMENT, ty: BindingType::Buffer { ty: bevy::render::render_resource::BufferBindingType::Uniform, has_dynamic_offset: false, min_binding_size: Some(PostProcessSettings::min_size()), }, count: None, }, ], }); ``` After: ```rust let layout = render_device.create_bind_group_layout( "post_process_bind_group_layout"), &BindGroupLayoutEntries::sequential( ShaderStages::FRAGMENT, ( texture_2d_f32(), sampler(SamplerBindingType::Filtering), uniform_buffer(false, Some(PostProcessSettings::min_size())), ), ), ); ``` Here's a more extreme example in bevy_solari: `86dab7f5da` --- ## Changelog - Added `BindGroupLayoutEntries` and all `BindingType` helper functions. ## Migration Guide `RenderDevice::create_bind_group_layout()` doesn't take a `BindGroupLayoutDescriptor` anymore. You need to provide the parameters separately ```rust // 0.12 let layout = render_device.create_bind_group_layout(&BindGroupLayoutDescriptor { label: Some("post_process_bind_group_layout"), entries: &[ BindGroupLayoutEntry { // ... }, ], }); // 0.13 let layout = render_device.create_bind_group_layout( "post_process_bind_group_layout", &[ BindGroupLayoutEntry { // ... }, ], ); ``` ## TODO - [x] implement a `Dynamic` variant - [x] update the `RenderDevice::create_bind_group_layout()` api to match the one from `RenderDevice::creat_bind_group()` - [x] docs	2023-11-28 04:00:49 +00:00
Carter Anderson	90958104cb	Ensure instance_index push constant is always used in prepass.wgsl (#10706 ) # Objective Kind of helps #10509 ## Solution Add a line to `prepass.wgsl` that ensure the `instance_index` push constant is always used on WebGL 2. This is not a full fix, as the _second_ a custom shader is used that doesn't use the push constant, the breakage will resurface. We have satisfying medium term and long term solutions. This is just a short term hack for 0.12.1 that will make more cases work. See #10509 for more details.	2023-11-28 01:13:25 +00:00
kayh	c98481aa56	Fix bevy_pbr shader function name (#10423 ) # Objective Fix a shader error that happens when using pbr morph targets. ## Solution Fix the function name in the `prepass.wgsl` shader, which is incorrectly prefixed with `morph::` (added in `61bad4eb57 (diff-97e4500f0a36bc6206d7b1490c8dd1a69459ee39dc6822eb9b2f7b160865f49fR42)`). This section of the shader is only enabled when using morph targets, so it seems like there are no tests / examples using it?	2023-11-07 20:04:25 +00:00
robtfm	5cc3352f5b	allow DeferredPrepass to work without other prepass markers (#10223 ) # Objective fix crash / misbehaviour when `DeferredPrepass` is used without `DepthPrepass`. - Deferred lighting requires the depth prepass texture to be present, so that the depth texture is available for binding. without it the deferred lighting pass will use 0 for depth of all meshes. - When `DeferredPrepass` is used without other prepass markers, and with any materials that use `OpaqueRenderMode::Forward`, those entities will try to queue to the `Opaque3dPrepass` render phase, which doesn't exist, causing a crash. ## Solution - check if the prepass phases exist before queueing - generate prepass textures if `Opaque3dDeferred` is present - add a note to the DeferredPrepass marker to note that DepthPrepass is also required by the default deferred lighting pass - also changed some `With<T>.is_some()`s to `Has<T>`s	2023-11-03 01:09:14 +00:00
Christopher Biscardi	74b5073f75	Make VERTEX_COLORS usable in prepass shader, if available (#10341 ) # Objective I was working with forward rendering prepass fragment shaders and ran into an issue of not being able to access vertex colors in the prepass. I was able to access vertex colors in regular fragment shaders as well as in deferred shaders. ## Solution It seems like this `if` was nested unintentionally as moving it outside of the `deferred` block works. --- ## Changelog Enable vertex colors in forward rendering prepass fragment shaders	2023-11-03 00:54:13 +00:00
Marco Buono	44928e0df4	`StandardMaterial` Light Transmission (#8015 ) # Objective <img width="1920" alt="Screenshot 2023-04-26 at 01 07 34" src="https://user-images.githubusercontent.com/418473/234467578-0f34187b-5863-4ea1-88e9-7a6bb8ce8da3.png"> This PR adds both diffuse and specular light transmission capabilities to the `StandardMaterial`, with support for screen space refractions. This enables realistically representing a wide range of real-world materials, such as: - Glass; (Including frosted glass) - Transparent and translucent plastics; - Various liquids and gels; - Gemstones; - Marble; - Wax; - Paper; - Leaves; - Porcelain. Unlike existing support for transparency, light transmission does not rely on fixed function alpha blending, and therefore works with both `AlphaMode::Opaque` and `AlphaMode::Mask` materials. ## Solution - Introduces a number of transmission related fields in the `StandardMaterial`; - For specular transmission: - Adds logic to take a view main texture snapshot after the opaque phase; (in order to perform screen space refractions) - Introduces a new `Transmissive3d` phase to the renderer, to which all meshes with `transmission > 0.0` materials are sent. - Calculates a light exit point (of the approximate mesh volume) using `ior` and `thickness` properties - Samples the snapshot texture with an adaptive number of taps across a `roughness`-controlled radius enabling “blurry” refractions - For diffuse transmission: - Approximates transmitted diffuse light by using a second, flipped + displaced, diffuse-only Lambertian lobe for each light source. ## To Do - [x] Figure out where `fresnel_mix()` is taking place, if at all, and where `dielectric_specular` is being calculated, if at all, and update them to use the `ior` value (Not a blocker, just a nice-to-have for more correct BSDF) - To the _best of my knowledge, this is now taking place, after `964340cdd`. The fresnel mix is actually "split" into two parts in our implementation, one `(1 - fresnel(...))` in the transmission, and `fresnel()` in the light implementations. A surface with more reflectance now will produce slightly dimmer transmission towards the grazing angle, as more of the light gets reflected. - [x] Add `transmission_texture` - [x] Add `diffuse_transmission_texture` - [x] Add `thickness_texture` - [x] Add `attenuation_distance` and `attenuation_color` - [x] Connect values to glTF loader - [x] `transmission` and `transmission_texture` - [x] `thickness` and `thickness_texture` - [x] `ior` - [ ] `diffuse_transmission` and `diffuse_transmission_texture` (needs upstream support in `gltf` crate, not a blocker) - [x] Add support for multiple screen space refraction “steps” - [x] Conditionally create no transmission snapshot texture at all if `steps == 0` - [x] Conditionally enable/disable screen space refraction transmission snapshots - [x] Read from depth pre-pass to prevent refracting pixels in front of the light exit point - [x] Use `interleaved_gradient_noise()` function for sampling blur in a way that benefits from TAA - [x] Drill down a TAA `#define`, tweak some aspects of the effect conditionally based on it - [x] Remove const array that's crashing under HLSL (unless a new `naga` release with https://github.com/gfx-rs/naga/pull/2496 comes out before we merge this) - [ ] Look into alternatives to the `switch` hack for dynamically indexing the const array (might not be needed, compilers seem to be decent at expanding it) - [ ] Add pipeline keys for gating transmission (do we really want/need this?) - [x] Tweak some material field/function names? ## A Note on Texture Packing _This was originally added as a comment to the `specular_transmission_texture`, `thickness_texture` and `diffuse_transmission_texture` documentation, I removed it since it was more confusing than helpful, and will likely be made redundant/will need to be updated once we have a better infrastructure for preprocessing assets_ Due to how channels are mapped, you can more efficiently use a single shared texture image for configuring the following: - R - `specular_transmission_texture` - G - `thickness_texture` - B - _unused_ - A - `diffuse_transmission_texture` The `KHR_materials_diffuse_transmission` glTF extension also defines a `diffuseTransmissionColorTexture`, that _we don't currently support_. One might choose to pack the intensity and color textures together, using RGB for the color and A for the intensity, in which case this packing advice doesn't really apply. --- ## Changelog - Added a new `Transmissive3d` render phase for rendering specular transmissive materials with screen space refractions - Added rendering support for transmitted environment map light on the `StandardMaterial` as a fallback for screen space refractions - Added `diffuse_transmission`, `specular_transmission`, `thickness`, `ior`, `attenuation_distance` and `attenuation_color` to the `StandardMaterial` - Added `diffuse_transmission_texture`, `specular_transmission_texture`, `thickness_texture` to the `StandardMaterial`, gated behind a new `pbr_transmission_textures` cargo feature (off by default, for maximum hardware compatibility) - Added `Camera3d::screen_space_specular_transmission_steps` for controlling the number of “layers of transparency” rendered for transmissive objects - Added a `TransmittedShadowReceiver` component for enabling shadows in (diffusely) transmitted light. (disabled by default, as it requires carefully setting up the `thickness` to avoid self-shadow artifacts) - Added support for the `KHR_materials_transmission`, `KHR_materials_ior` and `KHR_materials_volume` glTF extensions - Renamed items related to temporal jitter for greater consistency ## Migration Guide - `SsaoPipelineKey::temporal_noise` has been renamed to `SsaoPipelineKey::temporal_jitter` - The `TAA` shader def (controlled by the presence of the `TemporalAntiAliasSettings` component in the camera) has been replaced with the `TEMPORAL_JITTER` shader def (controlled by the presence of the `TemporalJitter` component in the camera) - `MeshPipelineKey::TAA` has been replaced by `MeshPipelineKey::TEMPORAL_JITTER` - The `TEMPORAL_NOISE` shader def has been consolidated with `TEMPORAL_JITTER`	2023-10-31 20:59:02 +00:00
JMS55	b208388af9	Smaller TAA fixes (#10200 ) Extracted the easy stuff from #8974 . # Problem 1. Commands from `update_previous_view_projections` would crash when matching entities were despawned. 2. `TaaPipelineId` and `draw_3d_graph` module were not public. 3. When the motion vectors pointed to pixels that are now off screen, a smearing artifact could occur. # Solution 1. Use `try_insert` command instead. 2. Make them public, renaming to `TemporalAntiAliasPipelineId`. 3. Check for this case, and ignore history for pixels that are off-screen.	2023-10-27 23:13:14 +00:00
Nicola Papale	66f72dd25b	Use wildcard imports in bevy_pbr (#9847 ) # Objective - the style of import used by bevy guarantees merge conflicts when any file change - This is especially true when import lists are large, such as in `bevy_pbr` - Merge conflicts are tricky to resolve. This bogs down rendering PRs and makes contributing to bevy's rendering system more difficult than it needs to ## Solution - Use wildcard imports to replace multiline import list in `bevy_pbr` I suspect this is controversial, but I'd like to hear alternatives. Because this is one of many papercuts that makes developing render features near impossible.	2023-10-25 08:40:55 +00:00
robtfm	6f2a5cb862	Bind group entries (#9694 ) # Objective Simplify bind group creation code. alternative to (and based on) #9476 ## Solution - Add a `BindGroupEntries` struct that can transparently be used where `&[BindGroupEntry<'b>]` is required in BindGroupDescriptors. Allows constructing the descriptor's entries as: ```rust render_device.create_bind_group( "my_bind_group", &my_layout, &BindGroupEntries::with_indexes(( (2, &my_sampler), (3, my_uniform), )), ); ``` instead of ```rust render_device.create_bind_group( "my_bind_group", &my_layout, &[ BindGroupEntry { binding: 2, resource: BindingResource::Sampler(&my_sampler), }, BindGroupEntry { binding: 3, resource: my_uniform, }, ], ); ``` or ```rust render_device.create_bind_group( "my_bind_group", &my_layout, &BindGroupEntries::sequential((&my_sampler, my_uniform)), ); ``` instead of ```rust render_device.create_bind_group( "my_bind_group", &my_layout, &[ BindGroupEntry { binding: 0, resource: BindingResource::Sampler(&my_sampler), }, BindGroupEntry { binding: 1, resource: my_uniform, }, ], ); ``` the structs has no user facing macros, is tuple-type-based so stack allocated, and has no noticeable impact on compile time. - Also adds a `DynamicBindGroupEntries` struct with a similar api that uses a `Vec` under the hood and allows extending the entries. - Modifies `RenderDevice::create_bind_group` to take separate arguments `label`, `layout` and `entries` instead of a `BindGroupDescriptor` struct. The struct can't be stored due to the internal references, and with only 3 members arguably does not add enough context to justify itself. - Modify the codebase to use the new api and the `BindGroupEntries` / `DynamicBindGroupEntries` structs where appropriate (whenever the entries slice contains more than 1 member). ## Migration Guide - Calls to `RenderDevice::create_bind_group({BindGroupDescriptor { label, layout, entries })` must be amended to `RenderDevice::create_bind_group(label, layout, entries)`. - If `label`s have been specified as `"bind_group_name".into()`, they need to change to just `"bind_group_name"`. `Some("bind_group_name")` and `None` will still work, but `Some("bind_group_name")` can optionally be simplified to just `"bind_group_name"`. --------- Co-authored-by: IceSentry <IceSentry@users.noreply.github.com>	2023-10-21 15:39:22 +00:00
robtfm	61bad4eb57	update shader imports (#10180 ) # Objective - bump naga_oil to 0.10 - update shader imports to use rusty syntax ## Migration Guide naga_oil 0.10 reworks the import mechanism to support more syntax to make it more rusty, and test for item use before importing to determine which imports are modules and which are items, which allows: - use rust-style imports ``` #import bevy_pbr::{ pbr_functions::{alpha_discard as discard, apply_pbr_lighting}, mesh_bindings, } ``` - import partial paths: ``` #import part::of::path ... path::remainder::function(); ``` which will call to `part::of::path::remainder::function` - use fully qualified paths without importing: ``` // #import bevy_pbr::pbr_functions bevy_pbr::pbr_functions::pbr() ``` - use imported items without qualifying ``` #import bevy_pbr::pbr_functions::pbr // for backwards compatibility the old style is still supported: // #import bevy_pbr::pbr_functions pbr ... pbr() ``` - allows most imported items to end with `_` and numbers (naga_oil#30). still doesn't allow struct members to end with `_` or numbers but it's progress. - the vast majority of existing shader code will work without changes, but will emit "deprecated" warnings for old-style imports. these can be suppressed with the `allow-deprecated` feature. - partly breaks overrides (as far as i'm aware nobody uses these yet) - now overrides will only be applied if the overriding module is added as an additional import in the arguments to `Composer::make_naga_module` or `Composer::add_composable_module`. this is necessary to support determining whether imports are modules or items.	2023-10-21 11:51:58 +00:00
Marco Buono	9b80205acb	Variable `MeshPipeline` View Bind Group Layout (#10156 ) # Objective This PR aims to make it so that we don't accidentally go over `MAX_TEXTURE_IMAGE_UNITS` (in WebGL) or `maxSampledTexturesPerShaderStage` (in WebGPU), giving us some extra leeway to add more view bind group textures. (This PR is extracted from—and unblocks—#8015) ## Solution - We replace the existing `view_layout` and `view_layout_multisampled` pair with an array of 32 bind group layouts, generated ahead of time; - For now, these layouts cover all the possible combinations of: `multisampled`, `depth_prepass`, `normal_prepass`, `motion_vector_prepass` and `deferred_prepass`: - In the future, as @JMS55 pointed out, we can likely take out `motion_vector_prepass` and `deferred_prepass`, as these are not really needed for the mesh pipeline and can use separate pipelines. This would bring the possible combinations down to 8; - We can also add more "optional" textures as they become needed, allowing the engine to scale to a wider variety of use cases in lower end/web environments (e.g. some apps might just want normal and depth prepasses, others might only want light probes), while still keeping a high ceiling for high end native environments where more textures are supported. - While preallocating bind group layouts is relatively cheap, the number of combinations grows exponentially, so we should likely limit ourselves to something like at most 256–1024 total layouts until we find a better solution (like generating them lazily) - To make this mechanism a little bit more explicit/discoverable, so that compatibility with WebGPU/WebGL is not broken by accident, we add a `MESH_PIPELINE_VIEW_LAYOUT_SAFE_MAX_TEXTURES` const and warn whenever the number of textures in the layout crosses it. - The warning is gated by `#[cfg(debug_assertions)]` and not issued in release builds; - We're counting the actual textures in the bind group layout instead of using some roundabout metric so it should be accurate; - Right now `MESH_PIPELINE_VIEW_LAYOUT_SAFE_MAX_TEXTURES` is set to 10 in order to leave 6 textures free for other groups; - Currently there's no combination that would cause us to go over the limit, but that will change once #8015 lands. --- ## Changelog - `MeshPipeline` view bind group layouts now vary based on the current multisampling and prepass states, saving a couple of texture binding entries when prepasses are not in use. ## Migration Guide - `MeshPipeline::view_layout` and `MeshPipeline::view_layout_multisampled` have been replaced with a private array to accomodate for variable view bind group layouts. To obtain a view bind group layout for the current pipeline state, use the new `MeshPipeline::get_view_layout()` or `MeshPipeline::get_view_layout_from_key()` methods.	2023-10-21 11:19:44 +00:00
robtfm	de8a6007b7	check for any prepass phase (#10160 ) # Objective deferred doesn't currently run unless one of `DepthPrepass`, `ForwardPrepass` or `MotionVectorPrepass` is also present on the camera. ## Solution modify the `queue_prepass_material_meshes` view query to check for any relevant phase, instead of requiring `Opaque3dPrepass` and `AlphaMask3dPrepass` to be present	2023-10-17 19:28:52 +00:00
Marco Buono	5733d2403e	`_PREPASS` Shader Def Cleanup (#10136 ) # Objective - This PR aims to make the various `_PREPASS` shader defs we have (`NORMAL_PREPASS`, `DEPTH_PREPASS`, `MOTION_VECTORS_PREPASS` AND `DEFERRED_PREPASS`) easier to use and understand: - So that their meaning is now consistent across all contexts; (“prepass X is enabled for the current view”) - So that they're also consistently set across all contexts. - It also aims to enable us to (with a follow up PR) to conditionally gate the `BindGroupEntry` and `BindGroupLayoutEntry` items associated with these prepasses, saving us up to 4 texture slots in WebGL (currently globally limited to 16 per shader, regardless of bind groups) ## Solution - We now consistently set these from `PrepassPipeline`, the `MeshPipeline` and the `DeferredLightingPipeline`, we also set their `MeshPipelineKey`s; - We introduce `PREPASS_PIPELINE`, `MESH_PIPELINE` and `DEFERRED_LIGHTING_PIPELINE` that can be used to detect where the code is running, without overloading the meanings of the prepass shader defs; - We also gate the WGSL functions in `bevy_pbr::prepass_utils` with `#ifdef`s for their respective shader defs, so that shader code can provide a fallback whenever they're not available. - This allows us to conditionally include the bindings for these prepass textures (My next PR, which will hopefully unblock #8015) - @robtfm mentioned [these were being used to prevent accessing the same binding as read/write in the prepass](https://discord.com/channels/691052431525675048/743663924229963868/1163270458393759814), however even after reversing the `#ifndef`s I had no issues running the code, so perhaps the compiler is already smart enough even without tree shaking to know they're not being used, thanks to `#ifdef PREPASS_PIPELINE`? ## Comparison ### Before \| Shader Def \| `PrepassPipeline` \| `MeshPipeline` \| `DeferredLightingPipeline` \| \| ------------------------ \| ----------------- \| -------------- \| -------------------------- \| \| `NORMAL_PREPASS` \| Yes \| No \| No \| \| `DEPTH_PREPASS` \| Yes \| No \| No \| \| `MOTION_VECTORS_PREPASS` \| Yes \| No \| No \| \| `DEFERRED_PREPASS` \| Yes \| No \| No \| \| View Key \| `PrepassPipeline` \| `MeshPipeline` \| `DeferredLightingPipeline` \| \| ------------------------ \| ----------------- \| -------------- \| -------------------------- \| \| `NORMAL_PREPASS` \| Yes \| Yes \| No \| \| `DEPTH_PREPASS` \| Yes \| No \| No \| \| `MOTION_VECTORS_PREPASS` \| Yes \| No \| No \| \| `DEFERRED_PREPASS` \| Yes \| Yes\* \| No \| \* Accidentally was being set twice, once with only `deferred_prepass.is_some()` as a condition, and once with `deferred_p repass.is_some() && !forward` as a condition. ### After \| Shader Def \| `PrepassPipeline` \| `MeshPipeline` \| `DeferredLightingPipeline` \| \| ---------------------------- \| ----------------- \| --------------- \| -------------------------- \| \| `NORMAL_PREPASS` \| Yes \| Yes \| Yes \| \| `DEPTH_PREPASS` \| Yes \| Yes \| Yes \| \| `MOTION_VECTORS_PREPASS` \| Yes \| Yes \| Yes \| \| `DEFERRED_PREPASS` \| Yes \| Yes \| Unconditionally \| \| `PREPASS_PIPELINE` \| Unconditionally \| No \| No \| \| `MESH_PIPELINE` \| No \| Unconditionally \| No \| \| `DEFERRED_LIGHTING_PIPELINE` \| No \| No \| Unconditionally \| \| View Key \| `PrepassPipeline` \| `MeshPipeline` \| `DeferredLightingPipeline` \| \| ------------------------ \| ----------------- \| -------------- \| -------------------------- \| \| `NORMAL_PREPASS` \| Yes \| Yes \| Yes \| \| `DEPTH_PREPASS` \| Yes \| Yes \| Yes \| \| `MOTION_VECTORS_PREPASS` \| Yes \| Yes \| Yes \| \| `DEFERRED_PREPASS` \| Yes \| Yes \| Unconditionally \| --- ## Changelog - Cleaned up WGSL `*_PREPASS` shader defs so they're now consistently used everywhere; - Introduced `PREPASS_PIPELINE`, `MESH_PIPELINE` and `DEFERRED_LIGHTING_PIPELINE` WGSL shader defs for conditionally compiling logic based the current pipeline; - WGSL functions from `bevy_pbr::prepass_utils` are now guarded with `#ifdef` based on the currently enabled prepasses; ## Migration Guide - When using functions from `bevy_pbr::prepass_utils` (`prepass_depth()`, `prepass_normal()`, `prepass_motion_vector()`) in contexts where these prepasses might be disabled, you should now wrap your calls with the appropriate `#ifdef` guards, (`#ifdef DEPTH_PREPASS`, `#ifdef NORMAL_PREPASS`, `#ifdef MOTION_VECTOR_PREPASS`) providing fallback logic where applicable. --------- Co-authored-by: Carter Anderson <mcanders1@gmail.com> Co-authored-by: IceSentry <IceSentry@users.noreply.github.com>	2023-10-17 00:16:21 +00:00
robtfm	979c4094d4	pbr shader cleanup (#10105 ) # Objective cleanup some pbr shader code. improve shader stage io consistency and make pbr.wgsl (probably many people's first foray into bevy shader code) a little more human-readable. also fix a couple of small issues with deferred rendering. ## Solution mesh_vertex_output: - rename to forward_io (to align with prepass_io) - rename `MeshVertexOutput` to `VertexOutput` (to align with prepass_io) - move `Vertex` from mesh.wgsl into here (to align with prepass_io) prepass_io: - remove `FragmentInput`, use `VertexOutput` directly (to align with forward_io) - rename `VertexOutput::clip_position` to `position` (to align with forward_io) pbr.wgsl: - restructure so we don't need `#ifdefs` on the actual entrypoint, use VertexOutput and FragmentOutput in all cases and use #ifdefs to import the right struct definitions. - rearrange to make the flow clearer - move alpha_discard up from `pbr_functions::pbr` to avoid needing to call it on some branches and not others - add a bunch of comments deferred_lighting: - move ssao into the `!unlit` block to reflect forward behaviour correctly - fix compile error with deferred + premultiply_alpha ## Migration Guide in custom material shaders: - `pbr_functions::pbr` no longer calls to `pbr_functions::alpha_discard`. if you were using the `pbr` function in a custom shader with alpha mask mode you now also need to call alpha_discard manually - rename imports of `bevy_pbr::mesh_vertex_output` to `bevy_pbr::forward_io` - rename instances of `MeshVertexOutput` to `VertexOutput` in custom material prepass shaders: - rename instances of `VertexOutput::clip_position` to `VertexOutput::position`	2023-10-13 19:12:40 +00:00
Griffin	a15d152635	Deferred Renderer (#9258 ) # Objective - Add a [Deferred Renderer](https://en.wikipedia.org/wiki/Deferred_shading) to Bevy. - This allows subsequent passes to access per pixel material information before/during shading. - Accessing this per pixel material information is needed for some features, like GI. It also makes other features (ex. Decals) simpler to implement and/or improves their capability. There are multiple approaches to accomplishing this. The deferred shading approach works well given the limitations of WebGPU and WebGL2. Motivation: [I'm working on a GI solution for Bevy](https://youtu.be/eH1AkL-mwhI) # Solution - The deferred renderer is implemented with a prepass and a deferred lighting pass. - The prepass renders opaque objects into the Gbuffer attachment (`Rgba32Uint`). The PBR shader generates a `PbrInput` in mostly the same way as the forward implementation and then [packs it into the Gbuffer](`ec1465559f/crates/bevy_pbr/src/render/pbr.wgsl (L168)`). - The deferred lighting pass unpacks the `PbrInput` and [feeds it into the pbr() function](`ec1465559f/crates/bevy_pbr/src/deferred/deferred_lighting.wgsl (L65)`), then outputs the shaded color data. - There is now a resource [DefaultOpaqueRendererMethod](`ec1465559f/crates/bevy_pbr/src/material.rs (L599)`) that can be used to set the default render method for opaque materials. If materials return `None` from [opaque_render_method()](`ec1465559f/crates/bevy_pbr/src/material.rs (L131)`) the `DefaultOpaqueRendererMethod` will be used. Otherwise, custom materials can also explicitly choose to only support Deferred or Forward by returning the respective [OpaqueRendererMethod](`ec1465559f/crates/bevy_pbr/src/material.rs (L603)`) - Deferred materials can be used seamlessly along with both opaque and transparent forward rendered materials in the same scene. The [deferred rendering example](https://github.com/DGriffin91/bevy/blob/deferred/examples/3d/deferred_rendering.rs) does this. - The deferred renderer does not support MSAA. If any deferred materials are used, MSAA must be disabled. Both TAA and FXAA are supported. - Deferred rendering supports WebGL2/WebGPU. ## Custom deferred materials - Custom materials can support both deferred and forward at the same time. The [StandardMaterial](`ec1465559f/crates/bevy_pbr/src/render/pbr.wgsl (L166)`) does this. So does [this example](https://github.com/DGriffin91/bevy_glowy_orb_tutorial/blob/deferred/assets/shaders/glowy.wgsl#L56). - Custom deferred materials that require PBR lighting can create a `PbrInput`, write it to the deferred GBuffer and let it be rendered by the `PBRDeferredLightingPlugin`. - Custom deferred materials that require custom lighting have two options: 1. Use the base_color channel of the `PbrInput` combined with the `STANDARD_MATERIAL_FLAGS_UNLIT_BIT` flag. [Example.](https://github.com/DGriffin91/bevy_glowy_orb_tutorial/blob/deferred/assets/shaders/glowy.wgsl#L56) (If the unlit bit is set, the base_color is stored as RGB9E5 for extra precision) 2. A Custom Deferred Lighting pass can be created, either overriding the default, or running in addition. The a depth buffer is used to limit rendering to only the required fragments for each deferred lighting pass. Materials can set their respective depth id via the [deferred_lighting_pass_id](`b79182d2a3/crates/bevy_pbr/src/prepass/prepass_io.wgsl (L95)`) attachment. The custom deferred lighting pass plugin can then set [its corresponding depth](`ec1465559f/crates/bevy_pbr/src/deferred/deferred_lighting.wgsl (L37)`). Then with the lighting pass using [CompareFunction::Equal](`ec1465559f/crates/bevy_pbr/src/deferred/mod.rs (L335)`), only the fragments with a depth that equal the corresponding depth written in the material will be rendered. Custom deferred lighting plugins can also be created to render the StandardMaterial. The default deferred lighting plugin can be bypassed with `DefaultPlugins.set(PBRDeferredLightingPlugin { bypass: true })` --------- Co-authored-by: nickrart <nickolas.g.russell@gmail.com>	2023-10-12 22:10:38 +00:00
Robert Swain	b6ead2be95	Use EntityHashMap<Entity, T> for render world entity storage for better performance (#9903 ) # Objective - Improve rendering performance, particularly by avoiding the large system commands costs of using the ECS in the way that the render world does. ## Solution - Define `EntityHasher` that calculates a hash from the `Entity.to_bits()` by `i \| (i.wrapping_mul(0x517cc1b727220a95) << 32)`. `0x517cc1b727220a95` is something like `u64::MAX / N` for N that gives a value close to π and that works well for hashing. Thanks for @SkiFire13 for the suggestion and to @nicopap for alternative suggestions and discussion. This approach comes from `rustc-hash` (a.k.a. `FxHasher`) with some tweaks for the case of hashing an `Entity`. `FxHasher` and `SeaHasher` were also tested but were significantly slower. - Define `EntityHashMap` type that uses the `EntityHashser` - Use `EntityHashMap<Entity, T>` for render world entity storage, including: - `RenderMaterialInstances` - contains the `AssetId<M>` of the material associated with the entity. Also for 2D. - `RenderMeshInstances` - contains mesh transforms, flags and properties about mesh entities. Also for 2D. - `SkinIndices` and `MorphIndices` - contains the skin and morph index for an entity, respectively - `ExtractedSprites` - `ExtractedUiNodes` ## Benchmarks All benchmarks have been conducted on an M1 Max connected to AC power. The tests are run for 1500 frames. The 1000th frame is captured for comparison to check for visual regressions. There were none. ### 2D Meshes `bevymark --benchmark --waves 160 --per-wave 1000 --mode mesh2d` #### `--ordered-z` This test spawns the 2D meshes with z incrementing back to front, which is the ideal arrangement allocation order as it matches the sorted render order which means lookups have a high cache hit rate. <img width="1112" alt="Screenshot 2023-09-27 at 07 50 45" src="https://github.com/bevyengine/bevy/assets/302146/e140bc98-7091-4a3b-8ae1-ab75d16d2ccb"> -39.1% median frame time. #### Random This test spawns the 2D meshes with random z. This not only makes the batching and transparent 2D pass lookups get a lot of cache misses, it also currently means that the meshes are almost certain to not be batchable. <img width="1108" alt="Screenshot 2023-09-27 at 07 51 28" src="https://github.com/bevyengine/bevy/assets/302146/29c2e813-645a-43ce-982a-55df4bf7d8c4"> -7.2% median frame time. ### 3D Meshes `many_cubes --benchmark` <img width="1112" alt="Screenshot 2023-09-27 at 07 51 57" src="https://github.com/bevyengine/bevy/assets/302146/1a729673-3254-4e2a-9072-55e27c69f0fc"> -7.7% median frame time. ### Sprites NOTE: On `main` sprites are using `SparseSet<Entity, T>`! `bevymark --benchmark --waves 160 --per-wave 1000 --mode sprite` #### `--ordered-z` This test spawns the sprites with z incrementing back to front, which is the ideal arrangement allocation order as it matches the sorted render order which means lookups have a high cache hit rate. <img width="1116" alt="Screenshot 2023-09-27 at 07 52 31" src="https://github.com/bevyengine/bevy/assets/302146/bc8eab90-e375-4d31-b5cd-f55f6f59ab67"> +13.0% median frame time. #### Random This test spawns the sprites with random z. This makes the batching and transparent 2D pass lookups get a lot of cache misses. <img width="1109" alt="Screenshot 2023-09-27 at 07 53 01" src="https://github.com/bevyengine/bevy/assets/302146/22073f5d-99a7-49b0-9584-d3ac3eac3033"> +0.6% median frame time. ### UI NOTE: On `main` UI is using `SparseSet<Entity, T>`! `many_buttons` <img width="1111" alt="Screenshot 2023-09-27 at 07 53 26" src="https://github.com/bevyengine/bevy/assets/302146/66afd56d-cbe4-49e7-8b64-2f28f6043d85"> +15.1% median frame time. ## Alternatives - Cart originally suggested trying out `SparseSet<Entity, T>` and indeed that is slightly faster under ideal conditions. However, `PassHashMap<Entity, T>` has better worst case performance when data is randomly distributed, rather than in sorted render order, and does not have the worst case memory usage that `SparseSet`'s dense `Vec<usize>` that maps from the `Entity` index to sparse index into `Vec<T>`. This dense `Vec` has to be as large as the largest Entity index used with the `SparseSet`. - I also tested `PassHashMap<u32, T>`, intending to use `Entity.index()` as the key, but this proved to sometimes be slower and mostly no different. - The only outstanding approach that has not been implemented and tested is to _not_ clear the render world of its entities each frame. That has its own problems, though they could perhaps be solved. - Performance-wise, if the entities and their component data were not cleared, then they would incur table moves on spawn, and should not thereafter, rather just their component data would be overwritten. Ideally we would have a neat way of either updating data in-place via `&mut T` queries, or inserting components if not present. This would likely be quite cumbersome to have to remember to do everywhere, but perhaps it only needs to be done in the more performance-sensitive systems. - The main problem to solve however is that we want to both maintain a mapping between main world entities and render world entities, be able to run the render app and world in parallel with the main app and world for pipelined rendering, and at the same time be able to spawn entities in the render world in such a way that those Entity ids do not collide with those spawned in the main world. This is potentially quite solvable, but could well be a lot of ECS work to do it in a way that makes sense. --- ## Changelog - Changed: Component data for entities to be drawn are no longer stored on entities in the render world. Instead, data is stored in a `EntityHashMap<Entity, T>` in various resources. This brings significant performance benefits due to the way the render app clears entities every frame. Resources of most interest are `RenderMeshInstances` and `RenderMaterialInstances`, and their 2D counterparts. ## Migration Guide Previously the render app extracted mesh entities and their component data from the main world and stored them as entities and components in the render world. Now they are extracted into essentially `EntityHashMap<Entity, T>` where `T` are structs containing an appropriate group of data. This means that while extract set systems will continue to run extract queries against the main world they will store their data in hash maps. Also, systems in later sets will either need to look up entities in the available resources such as `RenderMeshInstances`, or maintain their own `EntityHashMap<Entity, T>` for their own data. Before: ```rust fn queue_custom( material_meshes: Query<(Entity, &MeshTransforms, &Handle<Mesh>), With<InstanceMaterialData>>, ) { ... for (entity, mesh_transforms, mesh_handle) in &material_meshes { ... } } ``` After: ```rust fn queue_custom( render_mesh_instances: Res<RenderMeshInstances>, instance_entities: Query<Entity, With<InstanceMaterialData>>, ) { ... for entity in &instance_entities { let Some(mesh_instance) = render_mesh_instances.get(&entity) else { continue; }; // The mesh handle in `AssetId<Mesh>` form, and the `MeshTransforms` can now // be found in `mesh_instance` which is a `RenderMeshInstance` ... } } ``` --------- Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com>	2023-09-27 08:28:28 +00:00
James Liu	12032cd296	Directly copy data into uniform buffers (#9865 ) # Objective This is a minimally disruptive version of #8340. I attempted to update it, but failed due to the scope of the changes added in #8204. Fixes #8307. Partially addresses #4642. As seen in https://github.com/bevyengine/bevy/issues/8284, we're actually copying data twice in Prepare stage systems. Once into a CPU-side intermediate scratch buffer, and once again into a mapped buffer. This is inefficient and effectively doubles the time spent and memory allocated to run these systems. ## Solution Skip the scratch buffer entirely and use `wgpu::Queue::write_buffer_with` to directly write data into mapped buffers. Separately, this also directly uses `wgpu::Limits::min_uniform_buffer_offset_alignment` to set up the alignment when writing to the buffers. Partially addressing the issue raised in #4642. Storage buffers and the abstractions built on top of `DynamicUniformBuffer` will need to come in followup PRs. This may not have a noticeable performance difference in this PR, as the only first-party systems affected by this are view related, and likely are not going to be particularly heavy. --- ## Changelog Added: `DynamicUniformBuffer::get_writer`. Added: `DynamicUniformBufferWriter`.	2023-09-25 19:15:37 +00:00
Robert Swain	5c884c5a15	Automatic batching/instancing of draw commands (#9685 ) # Objective - Implement the foundations of automatic batching/instancing of draw commands as the next step from #89 - NOTE: More performance improvements will come when more data is managed and bound in ways that do not require rebinding such as mesh, material, and texture data. ## Solution - The core idea for batching of draw commands is to check whether any of the information that has to be passed when encoding a draw command changes between two things that are being drawn according to the sorted render phase order. These should be things like the pipeline, bind groups and their dynamic offsets, index/vertex buffers, and so on. - The following assumptions have been made: - Only entities with prepared assets (pipelines, materials, meshes) are queued to phases - View bindings are constant across a phase for a given draw function as phases are per-view - `batch_and_prepare_render_phase` is the only system that performs this batching and has sole responsibility for preparing the per-object data. As such the mesh binding and dynamic offsets are assumed to only vary as a result of the `batch_and_prepare_render_phase` system, e.g. due to having to split data across separate uniform bindings within the same buffer due to the maximum uniform buffer binding size. - Implement `GpuArrayBuffer` for `Mesh2dUniform` to store Mesh2dUniform in arrays in GPU buffers rather than each one being at a dynamic offset in a uniform buffer. This is the same optimisation that was made for 3D not long ago. - Change batch size for a range in `PhaseItem`, adding API for getting or mutating the range. This is more flexible than a size as the length of the range can be used in place of the size, but the start and end can be otherwise whatever is needed. - Add an optional mesh bind group dynamic offset to `PhaseItem`. This avoids having to do a massive table move just to insert `GpuArrayBufferIndex` components. ## Benchmarks All tests have been run on an M1 Max on AC power. `bevymark` and `many_cubes` were modified to use 1920x1080 with a scale factor of 1. I run a script that runs a separate Tracy capture process, and then runs the bevy example with `--features bevy_ci_testing,trace_tracy` and `CI_TESTING_CONFIG=../benchmark.ron` with the contents of `../benchmark.ron`: ```rust ( exit_after: Some(1500) ) ``` ...in order to run each test for 1500 frames. The recent changes to `many_cubes` and `bevymark` added reproducible random number generation so that with the same settings, the same rng will occur. They also added benchmark modes that use a fixed delta time for animations. Combined this means that the same frames should be rendered both on main and on the branch. The graphs compare main (yellow) to this PR (red). ### 3D Mesh `many_cubes --benchmark` <img width="1411" alt="Screenshot 2023-09-03 at 23 42 10" src="https://github.com/bevyengine/bevy/assets/302146/2088716a-c918-486c-8129-090b26fd2bc4"> The mesh and material are the same for all instances. This is basically the best case for the initial batching implementation as it results in 1 draw for the ~11.7k visible meshes. It gives a ~30% reduction in median frame time. The 1000th frame is identical using the flip tool: ![flip many_cubes-main-mesh3d many_cubes-batching-mesh3d 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/2511f37a-6df8-481a-932f-706ca4de7643) ``` Mean: 0.000000 Weighted median: 0.000000 1st weighted quartile: 0.000000 3rd weighted quartile: 0.000000 Min: 0.000000 Max: 0.000000 Evaluation time: 0.4615 seconds ``` ### 3D Mesh `many_cubes --benchmark --material-texture-count 10` <img width="1404" alt="Screenshot 2023-09-03 at 23 45 18" src="https://github.com/bevyengine/bevy/assets/302146/5ee9c447-5bd2-45c6-9706-ac5ff8916daf"> This run uses 10 different materials by varying their textures. The materials are randomly selected, and there is no sorting by material bind group for opaque 3D so any batching is 'random'. The PR produces a ~5% reduction in median frame time. If we were to sort the opaque phase by the material bind group, then this should be a lot faster. This produces about 10.5k draws for the 11.7k visible entities. This makes sense as randomly selecting from 10 materials gives a chance that two adjacent entities randomly select the same material and can be batched. The 1000th frame is identical in flip: ![flip many_cubes-main-mesh3d-mtc10 many_cubes-batching-mesh3d-mtc10 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/2b3a8614-9466-4ed8-b50c-d4aa71615dbb) ``` Mean: 0.000000 Weighted median: 0.000000 1st weighted quartile: 0.000000 3rd weighted quartile: 0.000000 Min: 0.000000 Max: 0.000000 Evaluation time: 0.4537 seconds ``` ### 3D Mesh `many_cubes --benchmark --vary-per-instance` <img width="1394" alt="Screenshot 2023-09-03 at 23 48 44" src="https://github.com/bevyengine/bevy/assets/302146/f02a816b-a444-4c18-a96a-63b5436f3b7f"> This run varies the material data per instance by randomly-generating its colour. This is the worst case for batching and that it performs about the same as `main` is a good thing as it demonstrates that the batching has minimal overhead when dealing with ~11k visible mesh entities. The 1000th frame is identical according to flip: ![flip many_cubes-main-mesh3d-vpi many_cubes-batching-mesh3d-vpi 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/ac5f5c14-9bda-4d1a-8219-7577d4aac68c) ``` Mean: 0.000000 Weighted median: 0.000000 1st weighted quartile: 0.000000 3rd weighted quartile: 0.000000 Min: 0.000000 Max: 0.000000 Evaluation time: 0.4568 seconds ``` ### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode mesh2d` <img width="1412" alt="Screenshot 2023-09-03 at 23 59 56" src="https://github.com/bevyengine/bevy/assets/302146/cb02ae07-237b-4646-ae9f-fda4dafcbad4"> This spawns 160 waves of 1000 quad meshes that are shaded with ColorMaterial. Each wave has a different material so 160 waves currently should result in 160 batches. This results in a 50% reduction in median frame time. Capturing a screenshot of the 1000th frame main vs PR gives: ![flip bevymark-main-mesh2d bevymark-batching-mesh2d 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/80102728-1217-4059-87af-14d05044df40) ``` Mean: 0.001222 Weighted median: 0.750432 1st weighted quartile: 0.453494 3rd weighted quartile: 0.969758 Min: 0.000000 Max: 0.990296 Evaluation time: 0.4255 seconds ``` So they seem to produce the same results. I also double-checked the number of draws. `main` does 160000 draws, and the PR does 160, as expected. ### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode mesh2d --material-texture-count 10` <img width="1392" alt="Screenshot 2023-09-04 at 00 09 22" src="https://github.com/bevyengine/bevy/assets/302146/4358da2e-ce32-4134-82df-3ab74c40849c"> This generates 10 textures and generates materials for each of those and then selects one material per wave. The median frame time is reduced by 50%. Similar to the plain run above, this produces 160 draws on the PR and 160000 on `main` and the 1000th frame is identical (ignoring the fps counter text overlay). ![flip bevymark-main-mesh2d-mtc10 bevymark-batching-mesh2d-mtc10 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/ebed2822-dce7-426a-858b-b77dc45b986f) ``` Mean: 0.002877 Weighted median: 0.964980 1st weighted quartile: 0.668871 3rd weighted quartile: 0.982749 Min: 0.000000 Max: 0.992377 Evaluation time: 0.4301 seconds ``` ### 2D Mesh `bevymark --benchmark --waves 160 --per-wave 1000 --mode mesh2d --vary-per-instance` <img width="1396" alt="Screenshot 2023-09-04 at 00 13 53" src="https://github.com/bevyengine/bevy/assets/302146/b2198b18-3439-47ad-919a-cdabe190facb"> This creates unique materials per instance by randomly-generating the material's colour. This is the worst case for 2D batching. Somehow, this PR manages a 7% reduction in median frame time. Both main and this PR issue 160000 draws. The 1000th frame is the same: ![flip bevymark-main-mesh2d-vpi bevymark-batching-mesh2d-vpi 67ppd ldr](https://github.com/bevyengine/bevy/assets/302146/a2ec471c-f576-4a36-a23b-b24b22578b97) ``` Mean: 0.001214 Weighted median: 0.937499 1st weighted quartile: 0.635467 3rd weighted quartile: 0.979085 Min: 0.000000 Max: 0.988971 Evaluation time: 0.4462 seconds ``` ### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode sprite` <img width="1396" alt="Screenshot 2023-09-04 at 12 21 12" src="https://github.com/bevyengine/bevy/assets/302146/8b31e915-d6be-4cac-abf5-c6a4da9c3d43"> This just spawns 160 waves of 1000 sprites. There should be and is no notable difference between main and the PR. ### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode sprite --material-texture-count 10` <img width="1389" alt="Screenshot 2023-09-04 at 12 36 08" src="https://github.com/bevyengine/bevy/assets/302146/45fe8d6d-c901-4062-a349-3693dd044413"> This spawns the sprites selecting a texture at random per instance from the 10 generated textures. This has no significant change vs main and shouldn't. ### 2D Sprite `bevymark --benchmark --waves 160 --per-wave 1000 --mode sprite --vary-per-instance` <img width="1401" alt="Screenshot 2023-09-04 at 12 29 52" src="https://github.com/bevyengine/bevy/assets/302146/762c5c60-352e-471f-8dbe-bbf10e24ebd6"> This sets the sprite colour as being unique per instance. This can still all be drawn using one batch. There should be no difference but the PR produces median frame times that are 4% higher. Investigation showed no clear sources of cost, rather a mix of give and take that should not happen. It seems like noise in the results. ### Summary \| Benchmark \| % change in median frame time \| \| ------------- \| ------------- \| \| many_cubes \| 🟩 -30% \| \| many_cubes 10 materials \| 🟩 -5% \| \| many_cubes unique materials \| 🟩 ~0% \| \| bevymark mesh2d \| 🟩 -50% \| \| bevymark mesh2d 10 materials \| 🟩 -50% \| \| bevymark mesh2d unique materials \| 🟩 -7% \| \| bevymark sprite \| 🟥 2% \| \| bevymark sprite 10 materials \| 🟥 0.6% \| \| bevymark sprite unique materials \| 🟥 4.1% \| --- ## Changelog - Added: 2D and 3D mesh entities that share the same mesh and material (same textures, same data) are now batched into the same draw command for better performance. --------- Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com> Co-authored-by: Nicola Papale <nico@nicopap.ch>	2023-09-21 22:12:34 +00:00

1 2

87 Commits