forestia/bevy

Author	SHA1	Message	Date
JMS55	9cc7e7c080	Meshlet screenspace-derived tangents (#15084 ) * Save 16 bytes per vertex by calculating tangents in the shader at runtime, rather than storing them in the vertex data. * Based on https://jcgt.org/published/0009/03/04, https://www.jeremyong.com/graphics/2023/12/16/surface-gradient-bump-mapping. * Fixed visbuffer resolve to use the updated algorithm that flips ddy correctly * Added some more docs about meshlet material limitations, and some TODOs about transforming UV coordinates for the future. ![image](https://github.com/user-attachments/assets/222d8192-8c82-4d77-945d-53670a503761) For testing add a normal map to the bunnies with StandardMaterial like below, and then test that on both main and this PR (make sure to download the correct bunny for each). Results should be mostly identical. ```rust normal_map_texture: Some(asset_server.load_with_settings( "textures/BlueNoise-Normal.png", \|settings: &mut ImageLoaderSettings\| settings.is_srgb = false, )), ```	2024-09-29 18:39:25 +00:00
Zachary Harrold	d70595b667	Add `core` and `alloc` over `std` Lints (#15281 ) # Objective - Fixes #6370 - Closes #6581 ## Solution - Added the following lints to the workspace: - `std_instead_of_core` - `std_instead_of_alloc` - `alloc_instead_of_core` - Used `cargo +nightly fmt` with [item level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Item%5C%3A) to split all `use` statements into single items. - Used `cargo clippy --workspace --all-targets --all-features --fix --allow-dirty` to _attempt_ to resolve the new linting issues, and intervened where the lint was unable to resolve the issue automatically (usually due to needing an `extern crate alloc;` statement in a crate root). - Manually removed certain uses of `std` where negative feature gating prevented `--all-features` from finding the offending uses. - Used `cargo +nightly fmt` with [crate level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Crate%5C%3A) to re-merge all `use` statements matching Bevy's previous styling. - Manually fixed cases where the `fmt` tool could not re-merge `use` statements due to conditional compilation attributes. ## Testing - Ran CI locally ## Migration Guide The MSRV is now 1.81. Please update to this version or higher. ## Notes - This is a _massive_ change to try and push through, which is why I've outlined the semi-automatic steps I used to create this PR, in case this fails and someone else tries again in the future. - Making this change has no impact on user code, but does mean Bevy contributors will be warned to use `core` and `alloc` instead of `std` where possible. - This lint is a critical first step towards investigating `no_std` options for Bevy. --------- Co-authored-by: François Mockers <francois.mockers@vleue.com>	2024-09-27 00:59:59 +00:00
Benjamin Brienen	1b8c1c1242	simplify std::mem references (#15315 ) # Objective - Fixes #15314 ## Solution - Remove unnecessary usings and simplify references to those functions. ## Testing CI	2024-09-19 21:28:16 +00:00
JMS55	6cc96f4c1f	Meshlet software raster + start of cleanup (#14623 ) # Objective - Faster meshlet rasterization path for small triangles - Avoid having to allocate and write out a triangle buffer - Refactor gpu_scene.rs ## Solution - Replace the 32bit visbuffer texture with a 64bit visbuffer buffer, where the left 32 bits encode depth, and the right 32 bits encode the existing cluster + triangle IDs. Can't use 64bit textures, wgpu/naga doesn't support atomic ops on textures yet. - Instead of writing out a buffer of packed cluster + triangle IDs (per triangle) to raster, the culling pass now writes out a buffer of just cluster IDs (per cluster, so less memory allocated, cheaper to write out). - Clusters for software raster are allocated from the left side - Clusters for hardware raster are allocated in the same buffer, from the right side - The buffer size is fixed at MeshletPlugin build time, and should be set to a reasonable value for your scene (no warning on overflow, and no good way to determine what value you need outside of renderdoc - I plan to fix this in a future PR adding a meshlet stats overlay) - Currently I don't have a heuristic for software vs hardware raster selection for each cluster. The existing code is just a placeholder. I need to profile on a release scene and come up with a heuristic, probably in a future PR. - The culling shader is getting pretty hard to follow at this point, but I don't want to spend time improving it as the entire shader/pass is getting rewritten/replaced in the near future. - Software raster is a compute workgroup per-cluster. Each workgroup loads and transforms the <=64 vertices of the cluster, and then rasterizes the <=64 triangles of the cluster. - Two variants are implemented: Scanline for clusters with any larger triangles (still smaller than hardware is good at), and brute-force for very very tiny triangles - Once the shader determines that a pixel should be filled in, it does an atomicMax() on the visbuffer to store the results, copying how Nanite works - On devices with a low max workgroups per dispatch limit, an extra compute pass is inserted before software raster to convert from a 1d to 2d dispatch (I don't think 3d would ever be necessary). - I haven't implemented the top-left rule or subpixel precision yet, I'm leaving that for a future PR since I get usable results without it for now - Resources used: https://kristoffer-dyrkorn.github.io/triangle-rasterizer and chapters 6-8 of https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index - Hardware raster now spawns 64*3 vertex invocations per meshlet, instead of the actual meshlet vertex count. Extra invocations just early-exit. - While this is slower than the existing system, hardware draws should be rare now that software raster is usable, and it saves a ton of memory using the unified cluster ID buffer. This would be fixed if wgpu had support for mesh shaders. - Instead of writing to a color+depth attachment, the hardware raster pass also does the same atomic visbuffer writes that software raster uses. - We have to bind a dummy render target anyways, as wgpu doesn't currently support render passes without any attachments - Material IDs are no longer written out during the main rasterization passes. - If we had async compute queues, we could overlap the software and hardware raster passes. - New material and depth resolve passes run at the end of the visbuffer node, and write out view depth and material ID depth textures ### Misc changes - Fixed cluster culling importing, but never actually using the previous view uniforms when doing occlusion culling - Fixed incorrectly adding the LOD error twice when building the meshlet mesh - Splitup gpu_scene module into meshlet_mesh_manager, instance_manager, and resource_manager - resource_manager is still too complex and inefficient (extract and prepare are way too expensive). I plan on improving this in a future PR, but for now ResourceManager is mostly a 1:1 port of the leftover MeshletGpuScene bits. - Material draw passes have been renamed to the more accurate material shade pass, as well as some other misc renaming (in the future, these will be compute shaders even, and not actual draw calls) --- ## Migration Guide - TBD (ask me at the end of the release for meshlet changes as a whole) --------- Co-authored-by: vero <email@atlasdostal.com>	2024-08-26 17:54:34 +00:00
JMS55	6d6810c90d	Meshlet continuous LOD (#12755 ) Adds a basic level of detail system to meshlets. An extremely brief summary is as follows: * In `from_mesh.rs`, once we've built the first level of clusters, we group clusters, simplify the new mega-clusters, and then split the simplified groups back into regular sized clusters. Repeat several times (ideally until you can't anymore). This forms a directed acyclic graph (DAG), where the children are the meshlets from the previous level, and the parents are the more simplified versions of their children. The leaf nodes are meshlets formed from the original mesh. * In `cull_meshlets.wgsl`, each cluster selects whether to render or not based on the LOD bounding sphere (different than the culling bounding sphere) of the current meshlet, the LOD bounding sphere of its parent (the meshlet group from simplification), and the simplification error relative to its children of both the current meshlet and its parent meshlet. This kind of breaks two pass occlusion culling, which will be fixed in a future PR by using an HZB from the previous frame to get the initial list of occluders. Many, _many_ improvements to be done in the future https://github.com/bevyengine/bevy/issues/11518, not least of which is code quality and speed. I don't even expect this to work on many types of input meshes. This is just a basic implementation/draft for collaboration. Arguable how much we want to do in this PR, I'll leave that up to maintainers. I've erred on the side of "as basic as possible". References: * Slides 27-77 (video available on youtube) https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf * https://blog.traverseresearch.nl/creating-a-directed-acyclic-graph-from-a-mesh-1329e57286e5 * https://jglrxavpok.github.io/2024/01/19/recreating-nanite-lod-generation.html, https://jglrxavpok.github.io/2024/03/12/recreating-nanite-faster-lod-generation.html, https://jglrxavpok.github.io/2024/04/02/recreating-nanite-runtime-lod-selection.html, and https://github.com/jglrxavpok/Carrot * https://github.com/gents83/INOX/tree/master/crates/plugins/binarizer/src * https://cs418.cs.illinois.edu/website/text/nanite.html ![image](https://github.com/bevyengine/bevy/assets/47158642/e40bff9b-7d0c-4a19-a3cc-2aad24965977) ![image](https://github.com/bevyengine/bevy/assets/47158642/442c7da3-7761-4da7-9acd-37f15dd13e26) --------- Co-authored-by: Ricky Taylor <rickytaylor26@gmail.com> Co-authored-by: vero <email@atlasdostal.com> Co-authored-by: François <mockersf@gmail.com> Co-authored-by: atlas dostal <rodol@rivalrebels.com> Co-authored-by: Patrick Walton <pcwalton@mimiga.net>	2024-04-23 21:43:53 +00:00
James Liu	e62a01f403	Make PersistentGpuBufferable a safe trait (#12744 ) # Objective Fixes #12727. All parts that `PersistentGpuBuffer` interact with should be 100% safe both on the CPU and the GPU: `Queue::write_buffer_with` zeroes out the slice being written to and when uploading to the GPU, and all slice writes are bounds checked on the CPU side. ## Solution Make `PersistentGpuBufferable` a safe trait. Enforce it's correct implementation via assertions. Re-enable `forbid(unsafe_code)` on `bevy_pbr`.	2024-03-29 13:14:34 +00:00
James Liu	56bcbb0975	Forbid unsafe in most crates in the engine (#12684 ) # Objective Resolves #3824. `unsafe` code should be the exception, not the norm in Rust. It's obviously needed for various use cases as it's interfacing with platforms and essentially running the borrow checker at runtime in the ECS, but the touted benefits of Bevy is that we are able to heavily leverage Rust's safety, and we should be holding ourselves accountable to that by minimizing our unsafe footprint. ## Solution Deny `unsafe_code` workspace wide. Add explicit exceptions for the following crates, and forbid it in almost all of the others. * bevy_ecs - Obvious given how much unsafe is needed to achieve performant results * bevy_ptr - Works with raw pointers, even more low level than bevy_ecs. * bevy_render - due to needing to integrate with wgpu * bevy_window - due to needing to integrate with raw_window_handle * bevy_utils - Several unsafe utilities used by bevy_ecs. Ideally moved into bevy_ecs instead of made publicly usable. * bevy_reflect - Required for the unsafe type casting it's doing. * bevy_transform - for the parallel transform propagation * bevy_gizmos - For the SystemParam impls it has. * bevy_assets - To support reflection. Might not be required, not 100% sure yet. * bevy_mikktspace - due to being a conversion from a C library. Pending safe rewrite. * bevy_dynamic_plugin - Inherently unsafe due to the dynamic loading nature. Several uses of unsafe were rewritten, as they did not need to be using them: * bevy_text - a case of `Option::unchecked` could be rewritten as a normal for loop and match instead of an iterator. * bevy_color - the Pod/Zeroable implementations were replaceable with bytemuck's derive macros.	2024-03-27 03:30:08 +00:00
JMS55	4f20faaa43	Meshlet rendering (initial feature) (#10164 ) # Objective - Implements a more efficient, GPU-driven (https://github.com/bevyengine/bevy/issues/1342) rendering pipeline based on meshlets. - Meshes are split into small clusters of triangles called meshlets, each of which acts as a mini index buffer into the larger mesh data. Meshlets can be compressed, streamed, culled, and batched much more efficiently than monolithic meshes. ![image](https://github.com/bevyengine/bevy/assets/47158642/cb2aaad0-7a9a-4e14-93b0-15d4e895b26a) ![image](https://github.com/bevyengine/bevy/assets/47158642/7534035b-1eb7-4278-9b99-5322e4401715) # Misc * Future work: https://github.com/bevyengine/bevy/issues/11518 * Nanite reference: https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf Two pass occlusion culling explained very well: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501 --------- Co-authored-by: Ricky Taylor <rickytaylor26@gmail.com> Co-authored-by: vero <email@atlasdostal.com> Co-authored-by: François <mockersf@gmail.com> Co-authored-by: atlas dostal <rodol@rivalrebels.com>	2024-03-25 19:08:27 +00:00

8 Commits