forestia/bevy

Author	SHA1	Message	Date
Vic	f57c7a43c4	reexport entity set collections in entity module (#18413 ) # Objective Unlike for their helper typers, the import paths for `unique_array::UniqueEntityArray`, `unique_slice::UniqueEntitySlice`, `unique_vec::UniqueEntityVec`, `hash_set::EntityHashSet`, `hash_map::EntityHashMap`, `index_set::EntityIndexSet`, `index_map::EntityIndexMap` are quite redundant. When looking at the structure of `hashbrown`, we can also see that while both `HashSet` and `HashMap` have their own modules, the main types themselves are re-exported to the crate level. ## Solution Re-export the types in their shared `entity` parent module, and simplify the imports where they're used.	2025-03-30 03:51:14 +00:00
Zachary Harrold	5241e09671	Upgrade to Rust Edition 2024 (#17967 ) # Objective - Fixes #17960 ## Solution - Followed the [edition upgrade guide](https://doc.rust-lang.org/edition-guide/editions/transitioning-an-existing-project-to-a-new-edition.html) ## Testing - CI --- ## Summary of Changes ### Documentation Indentation When using lists in documentation, proper indentation is now linted for. This means subsequent lines within the same list item must start at the same indentation level as the item. ```rust /* Valid / /// - Item 1 /// Run-on sentence. /// - Item 2 struct Foo; / Invalid */ /// - Item 1 /// Run-on sentence. /// - Item 2 struct Foo; ``` ### Implicit `!` to `()` Conversion `!` (the never return type, returned by `panic!`, etc.) no longer implicitly converts to `()`. This is particularly painful for systems with `todo!` or `panic!` statements, as they will no longer be functions returning `()` (or `Result<()>`), making them invalid systems for functions like `add_systems`. The ideal fix would be to accept functions returning `!` (or rather, _not_ returning), but this is blocked on the [stabilisation of the `!` type itself](https://doc.rust-lang.org/std/primitive.never.html), which is not done. The "simple" fix would be to add an explicit `-> ()` to system signatures (e.g., `\|\| { todo!() }` becomes `\|\| -> () { todo!() }`). However, this is _also_ banned, as there is an existing lint which (IMO, incorrectly) marks this as an unnecessary annotation. So, the "fix" (read: workaround) is to put these kinds of `\|\| -> ! { ... }` closuers into variables and give the variable an explicit type (e.g., `fn()`). ```rust // Valid let system: fn() = \|\| todo!("Not implemented yet!"); app.add_systems(..., system); // Invalid app.add_systems(..., \|\| todo!("Not implemented yet!")); ``` ### Temporary Variable Lifetimes The order in which temporary variables are dropped has changed. The simple fix here is _usually_ to just assign temporaries to a named variable before use. ### `gen` is a keyword We can no longer use the name `gen` as it is reserved for a future generator syntax. This involved replacing uses of the name `gen` with `r#gen` (the raw-identifier syntax). ### Formatting has changed Use statements have had the order of imports changed, causing a substantial +/-3,000 diff when applied. For now, I have opted-out of this change by amending `rustfmt.toml` ```toml style_edition = "2021" ``` This preserves the original formatting for now, reducing the size of this PR. It would be a simple followup to update this to 2024 and run `cargo fmt`. ### New `use<>` Opt-Out Syntax Lifetimes are now implicitly included in RPIT types. There was a handful of instances where it needed to be added to satisfy the borrow checker, but there may be more cases where it _should_ be added to avoid breakages in user code. ### `MyUnitStruct { .. }` is an invalid pattern Previously, you could match against unit structs (and unit enum variants) with a `{ .. }` destructuring. This is no longer valid. ### Pretty much every use of `ref` and `mut` are gone Pattern binding has changed to the point where these terms are largely unused now. They still serve a purpose, but it is far more niche now. ### `iter::repeat(...).take(...)` is bad New lint recommends using the more explicit `iter::repeat_n(..., ...)` instead. ## Migration Guide The lifetimes of functions using return-position impl-trait (RPIT) are likely _more_ conservative than they had been previously. If you encounter lifetime issues with such a function, please create an issue to investigate the addition of `+ use<...>`. ## Notes - Check the individual commits for a clearer breakdown for what _actually_ changed. --------- Co-authored-by: François Mockers <francois.mockers@vleue.com>	2025-02-24 03:54:47 +00:00
JMS55	2fd4cc4937	Meshlet texture atomics (#17765 ) * Use texture atomics rather than buffer atomics for the visbuffer (haven't tested perf on a raster-heavy scene yet) * Unfortunately to clear the visbuffer we now need a compute pass to clear it. Using wgpu's clear_texture function internally uses a buffer -> image copy that's insanely expensive. Ideally it should be using vkCmdClearColorImage, which I've opened an issue for https://github.com/gfx-rs/wgpu/issues/7090. For now we'll have to stick with a custom compute pass and all the extra code that brings. * Faster resolve depth pass by discarding 0 depth pixels instead of redundantly writing zero (2x faster for big depth textures like shadow views)	2025-02-12 18:15:43 +00:00
Patrick Walton	dda97880c4	Implement experimental GPU two-phase occlusion culling for the standard 3D mesh pipeline. (#17413 ) Occlusion culling allows the GPU to skip the vertex and fragment shading overhead for objects that can be quickly proved to be invisible because they're behind other geometry. A depth prepass already eliminates most fragment shading overhead for occluded objects, but the vertex shading overhead, as well as the cost of testing and rejecting fragments against the Z-buffer, is presently unavoidable for standard meshes. We currently perform occlusion culling only for meshlets. But other meshes, such as skinned meshes, can benefit from occlusion culling too in order to avoid the transform and skinning overhead for unseen meshes. This commit adapts the same [two-phase occlusion culling] technique that meshlets use to Bevy's standard 3D mesh pipeline when the new `OcclusionCulling` component, as well as the `DepthPrepass` component, are present on the camera. It has these steps: 1. Early depth prepass: We use the hierarchical Z-buffer from the previous frame to cull meshes for the initial depth prepass, effectively rendering only the meshes that were visible in the last frame. 2. Early depth downsample: We downsample the depth buffer to create another hierarchical Z-buffer, this time with the current view transform. 3. Late depth prepass: We use the new hierarchical Z-buffer to test all meshes that weren't rendered in the early depth prepass. Any meshes that pass this check are rendered. 4. Late depth downsample: Again, we downsample the depth buffer to create a hierarchical Z-buffer in preparation for the early depth prepass of the next frame. This step is done after all the rendering, in order to account for custom phase items that might write to the depth buffer. Note that this patch has no effect on the per-mesh CPU overhead for occluded objects, which remains high for a GPU-driven renderer due to the lack of `cold-specialization` and retained bins. If `cold-specialization` and retained bins weren't on the horizon, then a more traditional approach like potentially visible sets (PVS) or low-res CPU rendering would probably be more efficient than the GPU-driven approach that this patch implements for most scenes. However, at this point the amount of effort required to implement a PVS baking tool or a low-res CPU renderer would probably be greater than landing `cold-specialization` and retained bins, and the GPU driven approach is the more modern one anyway. It does mean that the performance improvements from occlusion culling as implemented in this patch today are likely to be limited, because of the high CPU overhead for occluded meshes. Note also that this patch currently doesn't implement occlusion culling for 2D objects or shadow maps. Those can be addressed in a follow-up. Additionally, note that the techniques in this patch require compute shaders, which excludes support for WebGL 2. This PR is marked experimental because of known precision issues with the downsampling approach when applied to non-power-of-two framebuffer sizes (i.e. most of them). These precision issues can, in rare cases, cause objects to be judged occluded that in fact are not. (I've never seen this in practice, but I know it's possible; it tends to be likelier to happen with small meshes.) As a follow-up to this patch, we desire to switch to the [SPD-based hi-Z buffer shader from the Granite engine], which doesn't suffer from these problems, at which point we should be able to graduate this feature from experimental status. I opted not to include that rewrite in this patch for two reasons: (1) @JMS55 is planning on doing the rewrite to coincide with the new availability of image atomic operations in Naga; (2) to reduce the scope of this patch. A new example, `occlusion_culling`, has been added. It demonstrates objects becoming quickly occluded and disoccluded by dynamic geometry and shows the number of objects that are actually being rendered. Also, a new `--occlusion-culling` switch has been added to `scene_viewer`, in order to make it easy to test this patch with large scenes like Bistro. [two-phase occlusion culling]: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501 [Aaltonen SIGGRAPH 2015]: https://www.advances.realtimerendering.com/s2015/aaltonenhaar_siggraph2015_combined_final_footer_220dpi.pdf [Some literature]: https://gist.github.com/reduz/c5769d0e705d8ab7ac187d63be0099b5?permalink_comment_id=5040452#gistcomment-5040452 [SPD-based hi-Z buffer shader from the Granite engine]: https://github.com/Themaister/Granite/blob/master/assets/shaders/post/hiz.comp ## Migration guide * When enqueuing a custom mesh pipeline, work item buffers are now created with `bevy::render::batching::gpu_preprocessing::get_or_create_work_item_buffer`, not `PreprocessWorkItemBuffers::new`. See the `specialized_mesh_pipeline` example. ## Showcase Occlusion culling example: ![Screenshot 2025-01-15 175051](https://github.com/user-attachments/assets/1544f301-68a3-45f8-84a6-7af3ad431258) Bistro zoomed out, before occlusion culling: ![Screenshot 2025-01-16 185425](https://github.com/user-attachments/assets/5114bbdf-5dec-4de9-b17e-7aa77e7b61ed) Bistro zoomed out, after occlusion culling: ![Screenshot 2025-01-16 184949](https://github.com/user-attachments/assets/9dd67713-656c-4276-9768-6d261ca94300) In this scene, occlusion culling reduces the number of meshes Bevy has to render from 1591 to 585.	2025-01-27 05:02:46 +00:00
Alice Cecile	44ad3bf62b	Move `Resource` trait to its own file (#17469 ) # Objective `bevy_ecs`'s `system` module is something of a grab bag, and very large. This is particularly true for the `system_param` module, which is more than 2k lines long! While it could be defensible to put `Res` and `ResMut` there (lol no they're in change_detection.rs, obviously), it doesn't make any sense to put the `Resource` trait there. This is confusing to navigate (and painful to work on and review). ## Solution - Create a root level `bevy_ecs/resource.rs` module to mirror `bevy_ecs/component.rs` - move the `Resource` trait to that module - move the `Resource` derive macro to that module as well (Rust really likes when you pun on the names of the derive macro and trait and put them in the same path) - fix all of the imports ## Notes to reviewers - We could probably move more stuff into here, but I wanted to keep this PR as small as possible given the absurd level of import changes. - This PR is ground work for my upcoming attempts to store resource data on components (resources-as-entities). Splitting this code out will make the work and review a bit easier, and is the sort of overdue refactor that's good to do as part of more meaningful work. ## Testing cargo build works! ## Migration Guide `bevy_ecs::system::Resource` has been moved to `bevy_ecs::resource::Resource`.	2025-01-21 19:47:08 +00:00
Alice Cecile	5a9bc28502	Support non-Vec data structures in relations (#17447 ) # Objective The existing `RelationshipSourceCollection` uses `Vec` as the only possible backing for our relationships. While a reasonable choice, benchmarking use cases might reveal that a different data type is better or faster. For example: - Not all relationships require a stable ordering between the relationship sources (i.e. children). In cases where we a) have many such relations and b) don't care about the ordering between them, a hash set is likely a better datastructure than a `Vec`. - The number of children-like entities may be small on average, and a `smallvec` may be faster ## Solution - Implement `RelationshipSourceCollection` for `EntityHashSet`, our custom entity-optimized `HashSet`. -~~Implement `DoubleEndedIterator` for `EntityHashSet` to make things compile.~~ - This implementation was cursed and very surprising. - Instead, by moving the iterator type on `RelationshipSourceCollection` from an erased RPTIT to an explicit associated type we can add a trait bound on the offending methods! - Implement `RelationshipSourceCollection` for `SmallVec` ## Testing I've added a pair of new tests to make sure this pattern compiles successfully in practice! ## Migration Guide `EntityHashSet` and `EntityHashMap` are no longer re-exported in `bevy_ecs::entity` directly. If you were not using `bevy_ecs` / `bevy`'s `prelude`, you can access them through their now-public modules, `hash_set` and `hash_map` instead. ## Notes to reviewers The `EntityHashSet::Iter` type needs to be public for this impl to be allowed. I initially renamed it to something that wasn't ambiguous and re-exported it, but as @Victoronz pointed out, that was somewhat unidiomatic. In `1a8564898f`, I instead made the `entity_hash_set` public (and its `entity_hash_set`) sister public, and removed the re-export. I prefer this design (give me module docs please), but it leads to a lot of churn in this PR. Let me know which you'd prefer, and if you'd like me to split that change out into its own micro PR.	2025-01-20 21:26:08 +00:00
MichiRecRoom	3742e621ef	Allow `clippy::too_many_arguments` to lint without warnings (#17249 ) # Objective Many instances of `clippy::too_many_arguments` linting happen to be on systems - functions which we don't call manually, and thus there's not much reason to worry about the argument count. ## Solution Allow `clippy::too_many_arguments` globally, and remove all lint attributes related to it.	2025-01-09 07:26:15 +00:00
JMS55	3fb6cefb2f	Meshlet fill cluster buffers rewritten (#15955 ) # Objective - Make the meshlet fill cluster buffers pass slightly faster - Address https://github.com/bevyengine/bevy/issues/15920 for meshlets - Added PreviousGlobalTransform as a required meshlet component to avoid extra archetype moves, slightly alleviating https://github.com/bevyengine/bevy/issues/14681 for meshlets - Enforce that MeshletPlugin::cluster_buffer_slots is not greater than 2^25 (glitches will occur otherwise). Technically this field controls post-lod/culling cluster count, and the issue is on pre-lod/culling cluster count, but it's still valid now, and in the future this will be more true. Needs to be merged after https://github.com/bevyengine/bevy/pull/15846 and https://github.com/bevyengine/bevy/pull/15886 ## Solution - Old pass dispatched a thread per cluster, and did a binary search over the instances to find which instance the cluster belongs to, and what meshlet index within the instance it is. - New pass dispatches a workgroup per instance, and has the workgroup loop over all meshlets in the instance in order to write out the cluster data. - Use a push constant instead of arrayLength to fix the linked bug - Remap 1d->2d dispatch for software raster only if actually needed to save on spawning excess workgroups ## Testing - Did you test these changes? If so, how? - Ran the meshlet example, and an example with 1041 instances of 32217 meshlets per instance. Profiled the second scene with nsight, went from 0.55ms -> 0.40ms. Small savings. We're pretty much VRAM bandwidth bound at this point. - How can other people (reviewers) test your changes? Is there anything specific they need to know? - Run the meshlet example ## Changelog (non-meshlets) - PreviousGlobalTransform now implements the Default trait	2024-10-23 19:18:49 +00:00
JMS55	9d54fe0370	Meshlet new error projection (#15846 ) * New error projection code taken from @zeux's meshoptimizer nanite.cpp demo for determining LOD (thanks zeux!) * Builder: `compute_lod_group_data()` * Runtime: `lod_error_is_imperceptible()`	2024-10-22 20:14:30 +00:00
JMS55	aa626e4f0b	Per-meshlet compressed vertex data (#15643 ) # Objective - Prepare for streaming by storing vertex data per-meshlet, rather than per-mesh (this means duplicating vertices per-meshlet) - Compress vertex data to reduce the cost of this ## Solution The important parts are in from_mesh.rs, the changes to the Meshlet type in asset.rs, and the changes in meshlet_bindings.wgsl. Everything else is pretty secondary/boilerplate/straightforward changes. - Positions are quantized in centimeters with a user-provided power of 2 factor (ideally auto-determined, but that's a TODO for the future), encoded as an offset relative to the minimum value within the meshlet, and then stored as a packed list of bits using the minimum number of bits needed for each vertex position channel for that meshlet - E.g. quantize positions (lossly, throws away precision that's not needed leading to using less bits in the bitstream encoding) - Get the min/max quantized value of each X/Y/Z channel of the quantized positions within a meshlet - Encode values relative to the min value of the meshlet. E.g. convert from [min, max] to [0, max - min] - The new max value in the meshlet is (max - min), which only takes N bits, so we only need N bits to store each channel within the meshlet (lossless) - We can store the min value and that it takes N bits per channel in the meshlet metadata, and reconstruct the position from the bitstream - Normals are octahedral encoded and than snorm2x16 packed and stored as a single u32. - Would be better to implement the precise variant of octhedral encoding for extra precision (no extra decode cost), but decided to keep it simple for now and leave that as a followup - Tried doing a quantizing and bitstream encoding scheme like I did for positions, but struggled to get it smaller. Decided to go with this for simplicity for now - UVs are uncompressed and take a full 64bits per vertex which is expensive - In the future this should be improved - Tangents, as of the previous PR, are not explicitly stored and are instead derived from screen space gradients - While I'm here, split up MeshletMeshSaverLoader into two separate types Other future changes include implementing a smaller encoding of triangle data (3 u8 indices = 24 bits per triangle currently), and more disk-oriented compression schemes. References: * "A Deep Dive into UE5's Nanite Virtualized Geometry" https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf#page=128 (also available on youtube) * "Towards Practical Meshlet Compression" https://arxiv.org/pdf/2404.06359 * "Vertex quantization in Omniforce Game Engine" https://daniilvinn.github.io/2024/05/04/omniforce-vertex-quantization.html ## Testing - Did you test these changes? If so, how? - Converted the stanford bunny, and rendered it with a debug material showing normals, and confirmed that it's identical to what's on main. EDIT: See additional testing in the comments below. - Are there any parts that need more testing? - Could use some more size comparisons on various meshes, and testing different quantization factors. Not sure if 4 is a good default. EDIT: See additional testing in the comments below. - Also did not test runtime performance of the shaders. EDIT: See additional testing in the comments below. - How can other people (reviewers) test your changes? Is there anything specific they need to know? - Use my unholy script, replacing the meshlet example https://paste.rs/7xQHk.rs (must make MeshletMesh fields pub instead of pub crate, must add lz4_flex as a dev-dependency) (must compile with meshlet and meshlet_processor features, mesh must have only positions, normals, and UVs, no vertex colors or tangents) --- ## Migration Guide - TBD by JMS55 at the end of the release	2024-10-08 18:42:55 +00:00
Zachary Harrold	d70595b667	Add `core` and `alloc` over `std` Lints (#15281 ) # Objective - Fixes #6370 - Closes #6581 ## Solution - Added the following lints to the workspace: - `std_instead_of_core` - `std_instead_of_alloc` - `alloc_instead_of_core` - Used `cargo +nightly fmt` with [item level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Item%5C%3A) to split all `use` statements into single items. - Used `cargo clippy --workspace --all-targets --all-features --fix --allow-dirty` to _attempt_ to resolve the new linting issues, and intervened where the lint was unable to resolve the issue automatically (usually due to needing an `extern crate alloc;` statement in a crate root). - Manually removed certain uses of `std` where negative feature gating prevented `--all-features` from finding the offending uses. - Used `cargo +nightly fmt` with [crate level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Crate%5C%3A) to re-merge all `use` statements matching Bevy's previous styling. - Manually fixed cases where the `fmt` tool could not re-merge `use` statements due to conditional compilation attributes. ## Testing - Ran CI locally ## Migration Guide The MSRV is now 1.81. Please update to this version or higher. ## Notes - This is a _massive_ change to try and push through, which is why I've outlined the semi-automatic steps I used to create this PR, in case this fails and someone else tries again in the future. - Making this change has no impact on user code, but does mean Bevy contributors will be warned to use `core` and `alloc` instead of `std` where possible. - This lint is a critical first step towards investigating `no_std` options for Bevy. --------- Co-authored-by: François Mockers <francois.mockers@vleue.com>	2024-09-27 00:59:59 +00:00
Benjamin Brienen	1b8c1c1242	simplify std::mem references (#15315 ) # Objective - Fixes #15314 ## Solution - Remove unnecessary usings and simplify references to those functions. ## Testing CI	2024-09-19 21:28:16 +00:00
JMS55	a0faf9cd01	More triangles/vertices per meshlet (#15023 ) ### Builder changes - Increased meshlet max vertices/triangles from 64v/64t to 255v/128t (meshoptimizer won't allow 256v sadly). This gives us a much greater percentage of meshlets with max triangle count (128). Still not perfect, we still end up with some tiny <=10 triangle meshlets that never really get simplified, but it's progress. - Removed the error target limit. Now we allow meshoptimizer to simplify as much as possible. No reason to cap this out, as the cluster culling code will choose a good LOD level anyways. Again leads to higher quality LOD trees. - After some discussion and consulting the Nanite slides again, changed meshlet group error from _adding_ the max child's error to the group error, to doing `group_error = max(group_error, max_child_error)`. Error is already cumulative between LODs as the edges we're collapsing during simplification get longer each time. - Bumped the 65% simplification threshold to allow up to 95% of the original geometry (e.g. accept simplification as valid even if we only simplified 5% of the triangles). This gives us closer to log2(initial_meshlet_count) LOD levels, and fewer meshlet roots in the DAG. Still more work to be done in the future here. Maybe trying METIS for meshlet building instead of meshoptimizer. Using ~8 clusters per group instead of ~4 might also make a big difference. The Nanite slides say that they have 8-32 meshlets per group, suggesting some kind of heuristic. Unfortunately meshopt's compute_cluster_bounds won't work with large groups atm (https://github.com/zeux/meshoptimizer/discussions/750#discussioncomment-10562641) so hard to test. Based on discussion from https://github.com/bevyengine/bevy/discussions/14998, https://github.com/zeux/meshoptimizer/discussions/750, and discord. ### Runtime changes - cluster:triangle packed IDs are now stored 25:7 instead of 26:6 bits, as max triangles per cluster are now 128 instead of 64 - Hardware raster now spawns 128 * 3 vertices instead of 64 * 3 vertices to account for the new max triangles limit - Hardware raster now outputs NaN triangles (0 / 0) instead of zero-positioned triangles for extra vertex invocations over the cluster triangle count. Shouldn't really be a difference idt, but I did it anyways. - Software raster now does 128 threads per workgroup instead of 64 threads. Each thread now loads, projects, and caches a vertex (vertices 0-127), and then if needed does so again (vertices 128-254). Each thread then rasterizes one of 128 triangles. - Fixed a bug with `needs_dispatch_remap`. I had the condition backwards in my last PR, I probably committed it by accident after testing the non-default code path on my GPU.	2024-09-08 17:55:57 +00:00
JMS55	6cc96f4c1f	Meshlet software raster + start of cleanup (#14623 ) # Objective - Faster meshlet rasterization path for small triangles - Avoid having to allocate and write out a triangle buffer - Refactor gpu_scene.rs ## Solution - Replace the 32bit visbuffer texture with a 64bit visbuffer buffer, where the left 32 bits encode depth, and the right 32 bits encode the existing cluster + triangle IDs. Can't use 64bit textures, wgpu/naga doesn't support atomic ops on textures yet. - Instead of writing out a buffer of packed cluster + triangle IDs (per triangle) to raster, the culling pass now writes out a buffer of just cluster IDs (per cluster, so less memory allocated, cheaper to write out). - Clusters for software raster are allocated from the left side - Clusters for hardware raster are allocated in the same buffer, from the right side - The buffer size is fixed at MeshletPlugin build time, and should be set to a reasonable value for your scene (no warning on overflow, and no good way to determine what value you need outside of renderdoc - I plan to fix this in a future PR adding a meshlet stats overlay) - Currently I don't have a heuristic for software vs hardware raster selection for each cluster. The existing code is just a placeholder. I need to profile on a release scene and come up with a heuristic, probably in a future PR. - The culling shader is getting pretty hard to follow at this point, but I don't want to spend time improving it as the entire shader/pass is getting rewritten/replaced in the near future. - Software raster is a compute workgroup per-cluster. Each workgroup loads and transforms the <=64 vertices of the cluster, and then rasterizes the <=64 triangles of the cluster. - Two variants are implemented: Scanline for clusters with any larger triangles (still smaller than hardware is good at), and brute-force for very very tiny triangles - Once the shader determines that a pixel should be filled in, it does an atomicMax() on the visbuffer to store the results, copying how Nanite works - On devices with a low max workgroups per dispatch limit, an extra compute pass is inserted before software raster to convert from a 1d to 2d dispatch (I don't think 3d would ever be necessary). - I haven't implemented the top-left rule or subpixel precision yet, I'm leaving that for a future PR since I get usable results without it for now - Resources used: https://kristoffer-dyrkorn.github.io/triangle-rasterizer and chapters 6-8 of https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index - Hardware raster now spawns 64*3 vertex invocations per meshlet, instead of the actual meshlet vertex count. Extra invocations just early-exit. - While this is slower than the existing system, hardware draws should be rare now that software raster is usable, and it saves a ton of memory using the unified cluster ID buffer. This would be fixed if wgpu had support for mesh shaders. - Instead of writing to a color+depth attachment, the hardware raster pass also does the same atomic visbuffer writes that software raster uses. - We have to bind a dummy render target anyways, as wgpu doesn't currently support render passes without any attachments - Material IDs are no longer written out during the main rasterization passes. - If we had async compute queues, we could overlap the software and hardware raster passes. - New material and depth resolve passes run at the end of the visbuffer node, and write out view depth and material ID depth textures ### Misc changes - Fixed cluster culling importing, but never actually using the previous view uniforms when doing occlusion culling - Fixed incorrectly adding the LOD error twice when building the meshlet mesh - Splitup gpu_scene module into meshlet_mesh_manager, instance_manager, and resource_manager - resource_manager is still too complex and inefficient (extract and prepare are way too expensive). I plan on improving this in a future PR, but for now ResourceManager is mostly a 1:1 port of the leftover MeshletGpuScene bits. - Material draw passes have been renamed to the more accurate material shade pass, as well as some other misc renaming (in the future, these will be compute shaders even, and not actual draw calls) --- ## Migration Guide - TBD (ask me at the end of the release for meshlet changes as a whole) --------- Co-authored-by: vero <email@atlasdostal.com>	2024-08-26 17:54:34 +00:00

14 Commits