forestia/bevy

Author	SHA1	Message	Date
Ricky Taylor	9b9d3d81cb	Normalise matrix naming (#13489 ) # Objective - Fixes #10909 - Fixes #8492 ## Solution - Name all matrices `x_from_y`, for example `world_from_view`. ## Testing - I've tested most of the 3D examples. The `lighting` example particularly should hit a lot of the changes and appears to run fine. --- ## Changelog - Renamed matrices across the engine to follow a `y_from_x` naming, making the space conversion more obvious. ## Migration Guide - `Frustum`'s `from_view_projection`, `from_view_projection_custom_far` and `from_view_projection_no_far` were renamed to `from_clip_from_world`, `from_clip_from_world_custom_far` and `from_clip_from_world_no_far`. - `ComputedCameraValues::projection_matrix` was renamed to `clip_from_view`. - `CameraProjection::get_projection_matrix` was renamed to `get_clip_from_view` (this affects implementations on `Projection`, `PerspectiveProjection` and `OrthographicProjection`). - `ViewRangefinder3d::from_view_matrix` was renamed to `from_world_from_view`. - `PreviousViewData`'s members were renamed to `view_from_world` and `clip_from_world`. - `ExtractedView`'s `projection`, `transform` and `view_projection` were renamed to `clip_from_view`, `world_from_view` and `clip_from_world`. - `ViewUniform`'s `view_proj`, `unjittered_view_proj`, `inverse_view_proj`, `view`, `inverse_view`, `projection` and `inverse_projection` were renamed to `clip_from_world`, `unjittered_clip_from_world`, `world_from_clip`, `world_from_view`, `view_from_world`, `clip_from_view` and `view_from_clip`. - `GpuDirectionalCascade::view_projection` was renamed to `clip_from_world`. - `MeshTransforms`' `transform` and `previous_transform` were renamed to `world_from_local` and `previous_world_from_local`. - `MeshUniform`'s `transform`, `previous_transform`, `inverse_transpose_model_a` and `inverse_transpose_model_b` were renamed to `world_from_local`, `previous_world_from_local`, `local_from_world_transpose_a` and `local_from_world_transpose_b` (the `Mesh` type in WGSL mirrors this, however `transform` and `previous_transform` were named `model` and `previous_model`). - `Mesh2dTransforms::transform` was renamed to `world_from_local`. - `Mesh2dUniform`'s `transform`, `inverse_transpose_model_a` and `inverse_transpose_model_b` were renamed to `world_from_local`, `local_from_world_transpose_a` and `local_from_world_transpose_b` (the `Mesh2d` type in WGSL mirrors this). - In WGSL, in `bevy_pbr::mesh_functions`, `get_model_matrix` and `get_previous_model_matrix` were renamed to `get_world_from_local` and `get_previous_world_from_local`. - In WGSL, `bevy_sprite::mesh2d_functions::get_model_matrix` was renamed to `get_world_from_local`.	2024-06-03 16:56:53 +00:00
JMS55	5536079945	Meshlet single pass depth downsampling (SPD) (#13003 ) # Objective - Using multiple raster passes to generate the depth pyramid is extremely slow - Pulling data from the source image is the largest bottleneck, it's important to sample in a cache-aware pattern - Barriers and pipeline drain between the raster passes is the second largest bottleneck - Each separate RenderPass on the CPU is _really_ expensive ## Solution - Port [FidelityFX SPD](https://gpuopen.com/fidelityfx-spd) to WGSL, replacing meshlet's existing multiple raster passes with a ~~single~~ two compute dispatches. Lack of coherent buffers means we have to do the the last 64x64 tile from mip 7+ in a separate dispatch to ensure the mip 6 writes were flushed :( - Workgroup shared memory version only at the moment, as the subgroup operation is blocked by our upgrade to wgpu 0.20 #13186 - Don't enforce a power-of-2 depth pyramid texture size, simply scaling by 0.5 is fine	2024-06-03 12:41:14 +00:00
Aevyrie	b45786df41	Add Skybox Motion Vectors (#13617 ) # Objective - Add motion vector support to the skybox - This fixes the last remaining "gap" to complete the motion blur feature ## Solution - Add a pipeline for the skybox to write motion vectors to the prepass ## Testing - Used examples to test motion vectors using motion blur https://github.com/bevyengine/bevy/assets/2632925/74c0778a-7e77-4e68-8111-05791e4bfdd2 --------- Co-authored-by: Patrick Walton <pcwalton@mimiga.net>	2024-06-02 16:09:28 +00:00
JMS55	77ebabc4fe	Meshlet remove per-cluster data upload (#13125 ) # Objective - Per-cluster (instance of a meshlet) data upload is ridiculously expensive in both CPU and GPU time (8 bytes per cluster, millions of clusters, you very quickly run into PCIE bandwidth maximums, and lots of CPU-side copies and malloc). - We need to be uploading only per-instance/entity data. Anything else needs to be done on the GPU. ## Solution - Per instance, upload: - `meshlet_instance_meshlet_counts_prefix_sum` - An exclusive prefix sum over the count of how many clusters each instance has. - `meshlet_instance_meshlet_slice_starts` - The starting index of the meshlets for each instance within the `meshlets` buffer. - A new `fill_cluster_buffers` pass once at the start of the frame has a thread per cluster, and finds its instance ID and meshlet ID via a binary search of `meshlet_instance_meshlet_counts_prefix_sum` to find what instance it belongs to, and then uses that plus `meshlet_instance_meshlet_slice_starts` to find what number meshlet within the instance it is. The shader then writes out the per-cluster instance/meshlet ID buffers for later passes to quickly read from. - I've gone from 45 -> 180 FPS in my stress test scene, and saved ~30ms/frame of overall CPU/GPU time.	2024-05-04 19:56:19 +00:00
JMS55	e1a0da0fa6	Meshlet LOD-compatible two-pass occlusion culling (#12898 ) Keeping track of explicit visibility per cluster between frames does not work with LODs, and leads to worse culling (using the final depth buffer from the previous frame is more accurate). Instead, we need to generate a second depth pyramid after the second raster pass, and then use that in the first culling pass in the next frame to test if a cluster would have been visible last frame or not. As part of these changes, the write_index_buffer pass has been folded into the culling pass for a large performance gain, and to avoid tracking a lot of extra state that would be needed between passes. Prepass previous model/view stuff was adapted to work with meshlets as well. Also fixed a bug with materials, and other misc improvements. --------- Co-authored-by: François <mockersf@gmail.com> Co-authored-by: atlas dostal <rodol@rivalrebels.com> Co-authored-by: vero <email@atlasdostal.com> Co-authored-by: Patrick Walton <pcwalton@mimiga.net> Co-authored-by: Robert Swain <robert.swain@gmail.com>	2024-04-28 05:30:20 +00:00
JMS55	6d6810c90d	Meshlet continuous LOD (#12755 ) Adds a basic level of detail system to meshlets. An extremely brief summary is as follows: * In `from_mesh.rs`, once we've built the first level of clusters, we group clusters, simplify the new mega-clusters, and then split the simplified groups back into regular sized clusters. Repeat several times (ideally until you can't anymore). This forms a directed acyclic graph (DAG), where the children are the meshlets from the previous level, and the parents are the more simplified versions of their children. The leaf nodes are meshlets formed from the original mesh. * In `cull_meshlets.wgsl`, each cluster selects whether to render or not based on the LOD bounding sphere (different than the culling bounding sphere) of the current meshlet, the LOD bounding sphere of its parent (the meshlet group from simplification), and the simplification error relative to its children of both the current meshlet and its parent meshlet. This kind of breaks two pass occlusion culling, which will be fixed in a future PR by using an HZB from the previous frame to get the initial list of occluders. Many, _many_ improvements to be done in the future https://github.com/bevyengine/bevy/issues/11518, not least of which is code quality and speed. I don't even expect this to work on many types of input meshes. This is just a basic implementation/draft for collaboration. Arguable how much we want to do in this PR, I'll leave that up to maintainers. I've erred on the side of "as basic as possible". References: * Slides 27-77 (video available on youtube) https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf * https://blog.traverseresearch.nl/creating-a-directed-acyclic-graph-from-a-mesh-1329e57286e5 * https://jglrxavpok.github.io/2024/01/19/recreating-nanite-lod-generation.html, https://jglrxavpok.github.io/2024/03/12/recreating-nanite-faster-lod-generation.html, https://jglrxavpok.github.io/2024/04/02/recreating-nanite-runtime-lod-selection.html, and https://github.com/jglrxavpok/Carrot * https://github.com/gents83/INOX/tree/master/crates/plugins/binarizer/src * https://cs418.cs.illinois.edu/website/text/nanite.html ![image](https://github.com/bevyengine/bevy/assets/47158642/e40bff9b-7d0c-4a19-a3cc-2aad24965977) ![image](https://github.com/bevyengine/bevy/assets/47158642/442c7da3-7761-4da7-9acd-37f15dd13e26) --------- Co-authored-by: Ricky Taylor <rickytaylor26@gmail.com> Co-authored-by: vero <email@atlasdostal.com> Co-authored-by: François <mockersf@gmail.com> Co-authored-by: atlas dostal <rodol@rivalrebels.com> Co-authored-by: Patrick Walton <pcwalton@mimiga.net>	2024-04-23 21:43:53 +00:00
BD103	7b8d502083	Fix beta lints (#12980 ) # Objective - Fixes #12976 ## Solution This one is a doozy. - Run `cargo +beta clippy --workspace --all-targets --all-features` and fix all issues - This includes: - Moving inner attributes to be outer attributes, when the item in question has both inner and outer attributes - Use `ptr::from_ref` in more scenarios - Extend the valid idents list used by `clippy:doc_markdown` with more names - Use `Clone::clone_from` when possible - Remove redundant `ron` import - Add backticks to so many identifiers and items - I'm sorry whoever has to review this --- ## Changelog - Added links to more identifiers in documentation.	2024-04-16 02:46:46 +00:00
James Liu	a4ed1b88b8	Relax BufferVec's type constraints (#12866 ) # Objective Since BufferVec was first introduced, `bytemuck` has added additional traits with fewer restrictions than `Pod`. Within BufferVec, we only rely on the constraints of `bytemuck::cast_slice` to a `u8` slice, which now only requires `T: NoUninit` which is a strict superset of `Pod` types. ## Solution Change out the `Pod` generic type constraint with `NoUninit`. Also taking the opportunity to substitute `cast_slice` with `must_cast_slice`, which avoids a runtime panic in place of a compile time failure if `T` cannot be used. --- ## Changelog Changed: `BufferVec` now supports working with types containing `NoUninit` but not `Pod` members. Changed: `BufferVec` will now fail to compile if used with a type that cannot be safely read from. Most notably, this includes ZSTs, which would previously always panic at runtime.	2024-04-05 02:11:41 +00:00
JMS55	4f20faaa43	Meshlet rendering (initial feature) (#10164 ) # Objective - Implements a more efficient, GPU-driven (https://github.com/bevyengine/bevy/issues/1342) rendering pipeline based on meshlets. - Meshes are split into small clusters of triangles called meshlets, each of which acts as a mini index buffer into the larger mesh data. Meshlets can be compressed, streamed, culled, and batched much more efficiently than monolithic meshes. ![image](https://github.com/bevyengine/bevy/assets/47158642/cb2aaad0-7a9a-4e14-93b0-15d4e895b26a) ![image](https://github.com/bevyengine/bevy/assets/47158642/7534035b-1eb7-4278-9b99-5322e4401715) # Misc * Future work: https://github.com/bevyengine/bevy/issues/11518 * Nanite reference: https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf Two pass occlusion culling explained very well: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501 --------- Co-authored-by: Ricky Taylor <rickytaylor26@gmail.com> Co-authored-by: vero <email@atlasdostal.com> Co-authored-by: François <mockersf@gmail.com> Co-authored-by: atlas dostal <rodol@rivalrebels.com>	2024-03-25 19:08:27 +00:00

9 Commits