forestia/bevy

Author	SHA1	Message	Date
Zachary Harrold	9bc0ae33c3	Move `hashbrown` and `foldhash` out of `bevy_utils` (#17460 ) # Objective - Contributes to #16877 ## Solution - Moved `hashbrown`, `foldhash`, and related types out of `bevy_utils` and into `bevy_platform_support` - Refactored the above to match the layout of these types in `std`. - Updated crates as required. ## Testing - CI --- ## Migration Guide - The following items were moved out of `bevy_utils` and into `bevy_platform_support::hash`: - `FixedState` - `DefaultHasher` - `RandomState` - `FixedHasher` - `Hashed` - `PassHash` - `PassHasher` - `NoOpHash` - The following items were moved out of `bevy_utils` and into `bevy_platform_support::collections`: - `HashMap` - `HashSet` - `bevy_utils::hashbrown` has been removed. Instead, import from `bevy_platform_support::collections` _or_ take a dependency on `hashbrown` directly. - `bevy_utils::Entry` has been removed. Instead, import from `bevy_platform_support::collections::hash_map` or `bevy_platform_support::collections::hash_set` as appropriate. - All of the above equally apply to `bevy::utils` and `bevy::platform_support`. ## Notes - I left `PreHashMap`, `PreHashMapExt`, and `TypeIdMap` in `bevy_utils` as they might be candidates for micro-crating. They can always be moved into `bevy_platform_support` at a later date if desired.	2025-01-23 16:46:08 +00:00
MichiRecRoom	3742e621ef	Allow `clippy::too_many_arguments` to lint without warnings (#17249 ) # Objective Many instances of `clippy::too_many_arguments` linting happen to be on systems - functions which we don't call manually, and thus there's not much reason to worry about the argument count. ## Solution Allow `clippy::too_many_arguments` globally, and remove all lint attributes related to it.	2025-01-09 07:26:15 +00:00
JMS55	fe58993577	METIS-based meshlet generation (#16947 ) # Objective Improve DAG building for virtual geometry ## Solution - Use METIS to group triangles into meshlets which lets us minimize locked vertices which improves simplification, instead of using meshopt which prioritizes culling efficiency. Also some other minor tweaks. - Currently most meshlets have 126 triangles, and not 128. Fixing this might involve calling METIS recursively ourselves to manually bisect the graph, not sure. Not going to attempt to fix this in this PR. ## Testing - Did you test these changes? If so, how? - Tested on bunny.glb and cliff.glb - Are there any parts that need more testing? - No - How can other people (reviewers) test your changes? Is there anything specific they need to know? - Download the new bunny asset, run the meshlet example. --- ## Showcase New ![image](https://github.com/user-attachments/assets/68f5d2f0-a4ca-41e1-90d5-35a2c6969c21) Old ![image](https://github.com/user-attachments/assets/a3d97a09-773d-44b2-9990-25e1f6b51ec9) --------- Co-authored-by: IceSentry <IceSentry@users.noreply.github.com>	2025-01-05 02:03:26 +00:00
JMS55	b7ee23a59e	Remove meshlet builder retry queue (#16941 ) Revert the retry queue for stuck meshlet groups that couldn't simplify added in https://github.com/bevyengine/bevy/pull/15886. It was a hack that didn't really work, that was intended to help solve meshlets getting stuck and never getting simplified further. The actual solution is a new DAG building algorithm that I have coming in a followup PR. With that PR, there will be no need for the retry queue, as meshlets will rarely ever get stuck (I checked, the code never gets called). I split this off into it's own PR for easier reviewing. Meshlet IDs during building are back to being relative to the overall list of meshlets across all LODs, instead of starting at 0 for the first meshlet in the simplification queue for the current LOD, regardless of how many meshlets there are in the asset total. Not going to bother to regenerate the bunny asset for this PR.	2024-12-23 22:16:06 +00:00
Clar Fon	711246aa34	Update hashbrown to 0.15 (#15801 ) Updating dependencies; adopted version of #15696. (Supercedes #15696.) Long answer: hashbrown is no longer using ahash by default, meaning that we can't use the default-hasher methods with ahasher. So, we have to use the longer-winded versions instead. This takes the opportunity to also switch our default hasher as well, but without actually enabling the default-hasher feature for hashbrown, meaning that we'll be able to change our hasher more easily at the cost of all of these method calls being obnoxious forever. One large change from 0.15 is that `insert_unique_unchecked` is now `unsafe`, and for cases where unsafe code was denied at the crate level, I replaced it with `insert`. ## Migration Guide `bevy_utils` has updated its version of `hashbrown` to 0.15 and now defaults to `foldhash` instead of `ahash`. This means that if you've hard-coded your hasher to `bevy_utils::AHasher` or separately used the `ahash` crate in your code, you may need to switch to `foldhash` to ensure that everything works like it does in Bevy.	2024-12-10 19:45:50 +00:00
Zachary Harrold	a6adced9ed	Deny `derive_more` `error` feature and replace it with `thiserror` (#16684 ) # Objective - Remove `derive_more`'s error derivation and replace it with `thiserror` ## Solution - Added `derive_more`'s `error` feature to `deny.toml` to prevent it sneaking back in. - Reverted to `thiserror` error derivation ## Notes Merge conflicts were too numerous to revert the individual changes, so this reversion was done manually. Please scrutinise carefully during review.	2024-12-06 17:03:55 +00:00
JMS55	267b57e565	Meshlet normal-aware LOD and meshoptimizer upgrade (#16111 ) # Objective - Choose LOD based on normal simplification error in addition to position error - Update meshoptimizer to 0.22, which has a bunch of simplifier improvements ## Testing - Did you test these changes? If so, how? - Visualize normals, and compare LOD changes before and after. Normals no longer visibly change as the LOD cut changes. - Are there any parts that need more testing? - No - How can other people (reviewers) test your changes? Is there anything specific they need to know? - Run the meshlet example in this PR and on main and move around to change the LOD cut. Before running each example, in meshlet_mesh_material.wgsl, replace `let color = vec3(rand_f(&rng), rand_f(&rng), rand_f(&rng));` with `let color = (vertex_output.world_normal + 1.0) / 2.0;`. Make sure to download the appropriate bunny asset for each branch!	2024-11-04 15:20:22 +00:00
JMS55	6d42830c7f	Meshlet builder improvements redux (#15886 ) Take a bunch more improvements from @zeux's nanite.cpp code. * Use position-only vertices (discard other attributes) to determine meshlet connectivity for grouping * Rather than using the lock borders flag when simplifying meshlet groups, provide the locked vertices ourselves. The lock borders flag locks the entire border of the meshlet group, but really we only want to lock the edges between meshlet groups - outwards facing edges are fine to unlock. This gives a really significant increase to the DAG quality. * Add back stuck meshlets (group has only a single meshlet, simplification failed) to the simplification queue to allow them to get used later on and have another attempt at simplifying * Target 8 meshlets per group instead of 4 (second biggest improvement after manual locks) * Provide a seed to metis for deterministic meshlet building * Misc other improvements We can remove the usage of unsafe after the next upstream meshopt release, but for now we need to use the ffi function directly. I'll do another round of improvements later, mainly attribute-aware simplification and using spatial weights for meshlet grouping. Need to merge https://github.com/bevyengine/bevy/pull/15846 first.	2024-10-23 16:56:50 +00:00
JMS55	9d54fe0370	Meshlet new error projection (#15846 ) * New error projection code taken from @zeux's meshoptimizer nanite.cpp demo for determining LOD (thanks zeux!) * Builder: `compute_lod_group_data()` * Runtime: `lod_error_is_imperceptible()`	2024-10-22 20:14:30 +00:00
Rob Parrett	da5d2fccf5	Fix some duplicate words in docs/comments (#15980 ) # Objective Stumbled upon one of these, and set off in search of more, armed with my trusty `\b(\w+)\s+\1\b`. ## Solution Remove ~one~ one of them.	2024-10-20 01:03:27 +00:00
Zachary Harrold	46ad0b7513	Remove `thiserror` from `bevy_pbr` (#15767 ) # Objective - Contributes to #15460 ## Solution - Removed `thiserror` from `bevy_pbr`	2024-10-09 14:25:16 +00:00
JMS55	aa626e4f0b	Per-meshlet compressed vertex data (#15643 ) # Objective - Prepare for streaming by storing vertex data per-meshlet, rather than per-mesh (this means duplicating vertices per-meshlet) - Compress vertex data to reduce the cost of this ## Solution The important parts are in from_mesh.rs, the changes to the Meshlet type in asset.rs, and the changes in meshlet_bindings.wgsl. Everything else is pretty secondary/boilerplate/straightforward changes. - Positions are quantized in centimeters with a user-provided power of 2 factor (ideally auto-determined, but that's a TODO for the future), encoded as an offset relative to the minimum value within the meshlet, and then stored as a packed list of bits using the minimum number of bits needed for each vertex position channel for that meshlet - E.g. quantize positions (lossly, throws away precision that's not needed leading to using less bits in the bitstream encoding) - Get the min/max quantized value of each X/Y/Z channel of the quantized positions within a meshlet - Encode values relative to the min value of the meshlet. E.g. convert from [min, max] to [0, max - min] - The new max value in the meshlet is (max - min), which only takes N bits, so we only need N bits to store each channel within the meshlet (lossless) - We can store the min value and that it takes N bits per channel in the meshlet metadata, and reconstruct the position from the bitstream - Normals are octahedral encoded and than snorm2x16 packed and stored as a single u32. - Would be better to implement the precise variant of octhedral encoding for extra precision (no extra decode cost), but decided to keep it simple for now and leave that as a followup - Tried doing a quantizing and bitstream encoding scheme like I did for positions, but struggled to get it smaller. Decided to go with this for simplicity for now - UVs are uncompressed and take a full 64bits per vertex which is expensive - In the future this should be improved - Tangents, as of the previous PR, are not explicitly stored and are instead derived from screen space gradients - While I'm here, split up MeshletMeshSaverLoader into two separate types Other future changes include implementing a smaller encoding of triangle data (3 u8 indices = 24 bits per triangle currently), and more disk-oriented compression schemes. References: * "A Deep Dive into UE5's Nanite Virtualized Geometry" https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf#page=128 (also available on youtube) * "Towards Practical Meshlet Compression" https://arxiv.org/pdf/2404.06359 * "Vertex quantization in Omniforce Game Engine" https://daniilvinn.github.io/2024/05/04/omniforce-vertex-quantization.html ## Testing - Did you test these changes? If so, how? - Converted the stanford bunny, and rendered it with a debug material showing normals, and confirmed that it's identical to what's on main. EDIT: See additional testing in the comments below. - Are there any parts that need more testing? - Could use some more size comparisons on various meshes, and testing different quantization factors. Not sure if 4 is a good default. EDIT: See additional testing in the comments below. - Also did not test runtime performance of the shaders. EDIT: See additional testing in the comments below. - How can other people (reviewers) test your changes? Is there anything specific they need to know? - Use my unholy script, replacing the meshlet example https://paste.rs/7xQHk.rs (must make MeshletMesh fields pub instead of pub crate, must add lz4_flex as a dev-dependency) (must compile with meshlet and meshlet_processor features, mesh must have only positions, normals, and UVs, no vertex colors or tangents) --- ## Migration Guide - TBD by JMS55 at the end of the release	2024-10-08 18:42:55 +00:00
vero	6465e3bd9f	Fix Mesh allocator bug and reduce Mesh data copies by two (#15566 ) # Objective - First step towards #15558 ## Solution - Rename `get_vertex_buffer_data` to `create_packed_vertex_buffer_data` to make it clear that it is not "free" and actually allocates - Compute length analytically for preallocation instead of creating the buffer to get its length and immediately discard it - Use existing vertex attribute size calculation method to reduce code duplication - Fix a bug where mesh index data was being replaced by unnecessarily newly created mesh vertex data in some cases - Overall reduces mesh copies by two. We still have plenty to go, but these were the easy ones. ## Testing - I ran 3d_scene, lighting, and many_cubes, they look fine. - Benchmarks would be nice, but this is very obviously a win in perf and correctness. --- ## Migration Guide - `Mesh::create_packed_vertex_buffer_data` has been renamed `Mesh::create_packed_vertex_buffer_data` to reflect the fact that it copies data and allocates. ## Showcase - look mom, less copies	2024-10-01 17:15:57 +00:00
JMS55	9cc7e7c080	Meshlet screenspace-derived tangents (#15084 ) * Save 16 bytes per vertex by calculating tangents in the shader at runtime, rather than storing them in the vertex data. * Based on https://jcgt.org/published/0009/03/04, https://www.jeremyong.com/graphics/2023/12/16/surface-gradient-bump-mapping. * Fixed visbuffer resolve to use the updated algorithm that flips ddy correctly * Added some more docs about meshlet material limitations, and some TODOs about transforming UV coordinates for the future. ![image](https://github.com/user-attachments/assets/222d8192-8c82-4d77-945d-53670a503761) For testing add a normal map to the bunnies with StandardMaterial like below, and then test that on both main and this PR (make sure to download the correct bunny for each). Results should be mostly identical. ```rust normal_map_texture: Some(asset_server.load_with_settings( "textures/BlueNoise-Normal.png", \|settings: &mut ImageLoaderSettings\| settings.is_srgb = false, )), ```	2024-09-29 18:39:25 +00:00
Zachary Harrold	d70595b667	Add `core` and `alloc` over `std` Lints (#15281 ) # Objective - Fixes #6370 - Closes #6581 ## Solution - Added the following lints to the workspace: - `std_instead_of_core` - `std_instead_of_alloc` - `alloc_instead_of_core` - Used `cargo +nightly fmt` with [item level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Item%5C%3A) to split all `use` statements into single items. - Used `cargo clippy --workspace --all-targets --all-features --fix --allow-dirty` to _attempt_ to resolve the new linting issues, and intervened where the lint was unable to resolve the issue automatically (usually due to needing an `extern crate alloc;` statement in a crate root). - Manually removed certain uses of `std` where negative feature gating prevented `--all-features` from finding the offending uses. - Used `cargo +nightly fmt` with [crate level use formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Crate%5C%3A) to re-merge all `use` statements matching Bevy's previous styling. - Manually fixed cases where the `fmt` tool could not re-merge `use` statements due to conditional compilation attributes. ## Testing - Ran CI locally ## Migration Guide The MSRV is now 1.81. Please update to this version or higher. ## Notes - This is a _massive_ change to try and push through, which is why I've outlined the semi-automatic steps I used to create this PR, in case this fails and someone else tries again in the future. - Making this change has no impact on user code, but does mean Bevy contributors will be warned to use `core` and `alloc` instead of `std` where possible. - This lint is a critical first step towards investigating `no_std` options for Bevy. --------- Co-authored-by: François Mockers <francois.mockers@vleue.com>	2024-09-27 00:59:59 +00:00
Clar Fon	efda7f3f9c	Simpler lint fixes: makes `ci lints` work but disables a lint for now (#15376 ) Takes the first two commits from #15375 and adds suggestions from this comment: https://github.com/bevyengine/bevy/pull/15375#issuecomment-2366968300 See #15375 for more reasoning/motivation. ## Rebasing (rerunning) ```rust git switch simpler-lint-fixes git reset --hard main cargo fmt --all -- --unstable-features --config normalize_comments=true,imports_granularity=Crate cargo fmt --all git add --update git commit --message "rustfmt" cargo clippy --workspace --all-targets --all-features --fix cargo fmt --all -- --unstable-features --config normalize_comments=true,imports_granularity=Crate cargo fmt --all git add --update git commit --message "clippy" git cherry-pick e6c0b94f6795222310fb812fa5c4512661fc7887 ```	2024-09-24 11:42:59 +00:00
JMS55	a0faf9cd01	More triangles/vertices per meshlet (#15023 ) ### Builder changes - Increased meshlet max vertices/triangles from 64v/64t to 255v/128t (meshoptimizer won't allow 256v sadly). This gives us a much greater percentage of meshlets with max triangle count (128). Still not perfect, we still end up with some tiny <=10 triangle meshlets that never really get simplified, but it's progress. - Removed the error target limit. Now we allow meshoptimizer to simplify as much as possible. No reason to cap this out, as the cluster culling code will choose a good LOD level anyways. Again leads to higher quality LOD trees. - After some discussion and consulting the Nanite slides again, changed meshlet group error from _adding_ the max child's error to the group error, to doing `group_error = max(group_error, max_child_error)`. Error is already cumulative between LODs as the edges we're collapsing during simplification get longer each time. - Bumped the 65% simplification threshold to allow up to 95% of the original geometry (e.g. accept simplification as valid even if we only simplified 5% of the triangles). This gives us closer to log2(initial_meshlet_count) LOD levels, and fewer meshlet roots in the DAG. Still more work to be done in the future here. Maybe trying METIS for meshlet building instead of meshoptimizer. Using ~8 clusters per group instead of ~4 might also make a big difference. The Nanite slides say that they have 8-32 meshlets per group, suggesting some kind of heuristic. Unfortunately meshopt's compute_cluster_bounds won't work with large groups atm (https://github.com/zeux/meshoptimizer/discussions/750#discussioncomment-10562641) so hard to test. Based on discussion from https://github.com/bevyengine/bevy/discussions/14998, https://github.com/zeux/meshoptimizer/discussions/750, and discord. ### Runtime changes - cluster:triangle packed IDs are now stored 25:7 instead of 26:6 bits, as max triangles per cluster are now 128 instead of 64 - Hardware raster now spawns 128 * 3 vertices instead of 64 * 3 vertices to account for the new max triangles limit - Hardware raster now outputs NaN triangles (0 / 0) instead of zero-positioned triangles for extra vertex invocations over the cluster triangle count. Shouldn't really be a difference idt, but I did it anyways. - Software raster now does 128 threads per workgroup instead of 64 threads. Each thread now loads, projects, and caches a vertex (vertices 0-127), and then if needed does so again (vertices 128-254). Each thread then rasterizes one of 128 triangles. - Fixed a bug with `needs_dispatch_remap`. I had the condition backwards in my last PR, I probably committed it by accident after testing the non-default code path on my GPU.	2024-09-08 17:55:57 +00:00
JMS55	6cc96f4c1f	Meshlet software raster + start of cleanup (#14623 ) # Objective - Faster meshlet rasterization path for small triangles - Avoid having to allocate and write out a triangle buffer - Refactor gpu_scene.rs ## Solution - Replace the 32bit visbuffer texture with a 64bit visbuffer buffer, where the left 32 bits encode depth, and the right 32 bits encode the existing cluster + triangle IDs. Can't use 64bit textures, wgpu/naga doesn't support atomic ops on textures yet. - Instead of writing out a buffer of packed cluster + triangle IDs (per triangle) to raster, the culling pass now writes out a buffer of just cluster IDs (per cluster, so less memory allocated, cheaper to write out). - Clusters for software raster are allocated from the left side - Clusters for hardware raster are allocated in the same buffer, from the right side - The buffer size is fixed at MeshletPlugin build time, and should be set to a reasonable value for your scene (no warning on overflow, and no good way to determine what value you need outside of renderdoc - I plan to fix this in a future PR adding a meshlet stats overlay) - Currently I don't have a heuristic for software vs hardware raster selection for each cluster. The existing code is just a placeholder. I need to profile on a release scene and come up with a heuristic, probably in a future PR. - The culling shader is getting pretty hard to follow at this point, but I don't want to spend time improving it as the entire shader/pass is getting rewritten/replaced in the near future. - Software raster is a compute workgroup per-cluster. Each workgroup loads and transforms the <=64 vertices of the cluster, and then rasterizes the <=64 triangles of the cluster. - Two variants are implemented: Scanline for clusters with any larger triangles (still smaller than hardware is good at), and brute-force for very very tiny triangles - Once the shader determines that a pixel should be filled in, it does an atomicMax() on the visbuffer to store the results, copying how Nanite works - On devices with a low max workgroups per dispatch limit, an extra compute pass is inserted before software raster to convert from a 1d to 2d dispatch (I don't think 3d would ever be necessary). - I haven't implemented the top-left rule or subpixel precision yet, I'm leaving that for a future PR since I get usable results without it for now - Resources used: https://kristoffer-dyrkorn.github.io/triangle-rasterizer and chapters 6-8 of https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index - Hardware raster now spawns 64*3 vertex invocations per meshlet, instead of the actual meshlet vertex count. Extra invocations just early-exit. - While this is slower than the existing system, hardware draws should be rare now that software raster is usable, and it saves a ton of memory using the unified cluster ID buffer. This would be fixed if wgpu had support for mesh shaders. - Instead of writing to a color+depth attachment, the hardware raster pass also does the same atomic visbuffer writes that software raster uses. - We have to bind a dummy render target anyways, as wgpu doesn't currently support render passes without any attachments - Material IDs are no longer written out during the main rasterization passes. - If we had async compute queues, we could overlap the software and hardware raster passes. - New material and depth resolve passes run at the end of the visbuffer node, and write out view depth and material ID depth textures ### Misc changes - Fixed cluster culling importing, but never actually using the previous view uniforms when doing occlusion culling - Fixed incorrectly adding the LOD error twice when building the meshlet mesh - Splitup gpu_scene module into meshlet_mesh_manager, instance_manager, and resource_manager - resource_manager is still too complex and inefficient (extract and prepare are way too expensive). I plan on improving this in a future PR, but for now ResourceManager is mostly a 1:1 port of the leftover MeshletGpuScene bits. - Material draw passes have been renamed to the more accurate material shade pass, as well as some other misc renaming (in the future, these will be compute shaders even, and not actual draw calls) --- ## Migration Guide - TBD (ask me at the end of the release for meshlet changes as a whole) --------- Co-authored-by: vero <email@atlasdostal.com>	2024-08-26 17:54:34 +00:00
Sarthak Singh	2c4ef37b76	Changed `Mesh::attributes*` functions to return `MeshVertexAttribute` (#14394 ) # Objective Fixes #14365 ## Migration Guide - When using the iterator returned by `Mesh::attributes` or `Mesh::attributes_mut` the first value of the tuple is not the `MeshVertexAttribute` instead of `MeshVertexAttributeId`. To access the `MeshVertexAttributeId` use the `MeshVertexAttribute.id` field. Signed-off-by: Sarthak Singh <sarthak.singh99@gmail.com>	2024-08-12 15:54:28 +00:00
JMS55	6e8d43a037	Faster MeshletMesh deserialization (#14193 ) # Objective - Using bincode to deserialize binary into a MeshletMesh is expensive (~77ms for a 5mb file). ## Solution - Write a custom deserializer using bytemuck's Pod types and slice casting. - Total asset load time has gone from ~102ms to ~12ms. - Change some types I never meant to be public to private and other misc cleanup. ## Testing - Ran the meshlet example and added timing spans to the asset loader. --- ## Changelog - Improved `MeshletMesh` loading speed - The `MeshletMesh` disk format has changed, and `MESHLET_MESH_ASSET_VERSION` has been bumped - `MeshletMesh` fields are now private - Renamed `MeshletMeshSaverLoad` to `MeshletMeshSaverLoader` - The `Meshlet`, `MeshletBoundingSpheres`, and `MeshletBoundingSphere` types are now private - Removed `MeshletMeshSaveOrLoadError::SerializationOrDeserialization` - Added `MeshletMeshSaveOrLoadError::WrongFileType` ## Migration Guide - Regenerate your `MeshletMesh` assets, as the disk format has changed, and `MESHLET_MESH_ASSET_VERSION` has been bumped - `MeshletMesh` fields are now private - `MeshletMeshSaverLoad` is now named `MeshletMeshSaverLoader` - The `Meshlet`, `MeshletBoundingSpheres`, and `MeshletBoundingSphere` types are now private - `MeshletMeshSaveOrLoadError::SerializationOrDeserialization` has been removed - Added `MeshletMeshSaveOrLoadError::WrongFileType`, match on this variant if you match on `MeshletMeshSaveOrLoadError`	2024-07-15 15:06:02 +00:00
Arseny Kapoulkine	4cd188568a	Improve MeshletMesh::from_mesh performance further (#14038 ) This change updates meshopt-rs to 0.3 to take advantage of the newly added sparse simplification mode: by default, simplifier assumes that the entire mesh is simplified and runs a set of calculations that are O(vertex count), but in our case we simplify many small mesh subsets which is inefficient. Sparse mode instead assumes that the simplified subset is only using a portion of the vertex buffer, and optimizes accordingly. This changes the meaning of the error (as it becomes relative to the subset, in our case a meshlet group); to ensure consistent error selection, we also use the ErrorAbsolute mode which allows us to operate in mesh coordinate space. Additionally, meshopt 0.3 runs optimizeMeshlet automatically as part of `build_meshlets` so we no longer need to call it ourselves. This reduces the time to build meshlet representation for Stanford Bunny mesh from ~1.65s to ~0.45s (3.7x) in optimized builds.	2024-06-27 00:06:22 +00:00
Arseny Kapoulkine	6eec73a9a5	Make meshlet processing deterministic (#13913 ) This is a followup to https://github.com/bevyengine/bevy/pull/13904 based on the discussion there, and switches two HashMaps that used meshlet ids as keys to Vec. In addition to a small further performance boost for `from_mesh` (1.66s => 1.60s), this makes processing deterministic modulo threading issues wrt CRT rand described in the linked PR. This is valuable for debugging, as you can visually or programmatically inspect the meshlet distribution before/after making changes that should not change the output, whereas previously every asset rebuild would change the meshlet structure. Tested with https://github.com/bevyengine/bevy/pull/13431; after this change, the visual output of meshlets is consistent between asset rebuilds, and the MD5 of the output GLB file does not change either, which was not the case before.	2024-06-20 00:58:43 +00:00
Arseny Kapoulkine	001cc147c6	Improve MeshletMesh::from_mesh performance (#13904 ) This change reworks `find_connected_meshlets` to scale more linearly with the mesh size, which significantly reduces the cost of building meshlet representations. As a small extra complexity reduction, it moves `simplify_scale` call out of the loop so that it's called once (it only depends on the vertex data => is safe to cache). The new implementation of connectivity analysis builds edge=>meshlet list data structure, which allows us to only iterate through `tuple_combinations` of a (usually) small list. There is still some redundancy as if two meshlets share two edges, they will be represented in the meshlet lists twice, but it's overall much faster. Since the hash traversal is non-deterministic, to keep this part of the algorithm deterministic for reproducible results we sort the output adjacency lists. Overall this reduces the time to process bunny mesh from ~4.2s to ~1.7s when using release; in unoptimized builds the delta is even more significant. This was tested by using https://github.com/bevyengine/bevy/pull/13431 and: a) comparing the result of `find_connected_meshlets` using old and new code; they are equal in all steps of the clustering process b) comparing the rendered result of the old code vs new code after making the rest of the algorithm deterministic: right now the loop that iterates through the result of `group_meshlets()` call executes in different order between program runs. This is orthogonal to this change and can be fixed separately. Note: a future change can shrink the processing time further from ~1.7s to ~0.4s with a small diff but that requires an update to meshopt crate which is pending in https://github.com/gwihlidal/meshopt-rs/pull/42. This change is independent.	2024-06-18 08:29:17 +00:00
JMS55	6d6810c90d	Meshlet continuous LOD (#12755 ) Adds a basic level of detail system to meshlets. An extremely brief summary is as follows: * In `from_mesh.rs`, once we've built the first level of clusters, we group clusters, simplify the new mega-clusters, and then split the simplified groups back into regular sized clusters. Repeat several times (ideally until you can't anymore). This forms a directed acyclic graph (DAG), where the children are the meshlets from the previous level, and the parents are the more simplified versions of their children. The leaf nodes are meshlets formed from the original mesh. * In `cull_meshlets.wgsl`, each cluster selects whether to render or not based on the LOD bounding sphere (different than the culling bounding sphere) of the current meshlet, the LOD bounding sphere of its parent (the meshlet group from simplification), and the simplification error relative to its children of both the current meshlet and its parent meshlet. This kind of breaks two pass occlusion culling, which will be fixed in a future PR by using an HZB from the previous frame to get the initial list of occluders. Many, _many_ improvements to be done in the future https://github.com/bevyengine/bevy/issues/11518, not least of which is code quality and speed. I don't even expect this to work on many types of input meshes. This is just a basic implementation/draft for collaboration. Arguable how much we want to do in this PR, I'll leave that up to maintainers. I've erred on the side of "as basic as possible". References: * Slides 27-77 (video available on youtube) https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf * https://blog.traverseresearch.nl/creating-a-directed-acyclic-graph-from-a-mesh-1329e57286e5 * https://jglrxavpok.github.io/2024/01/19/recreating-nanite-lod-generation.html, https://jglrxavpok.github.io/2024/03/12/recreating-nanite-faster-lod-generation.html, https://jglrxavpok.github.io/2024/04/02/recreating-nanite-runtime-lod-selection.html, and https://github.com/jglrxavpok/Carrot * https://github.com/gents83/INOX/tree/master/crates/plugins/binarizer/src * https://cs418.cs.illinois.edu/website/text/nanite.html ![image](https://github.com/bevyengine/bevy/assets/47158642/e40bff9b-7d0c-4a19-a3cc-2aad24965977) ![image](https://github.com/bevyengine/bevy/assets/47158642/442c7da3-7761-4da7-9acd-37f15dd13e26) --------- Co-authored-by: Ricky Taylor <rickytaylor26@gmail.com> Co-authored-by: vero <email@atlasdostal.com> Co-authored-by: François <mockersf@gmail.com> Co-authored-by: atlas dostal <rodol@rivalrebels.com> Co-authored-by: Patrick Walton <pcwalton@mimiga.net>	2024-04-23 21:43:53 +00:00
JMS55	4f20faaa43	Meshlet rendering (initial feature) (#10164 ) # Objective - Implements a more efficient, GPU-driven (https://github.com/bevyengine/bevy/issues/1342) rendering pipeline based on meshlets. - Meshes are split into small clusters of triangles called meshlets, each of which acts as a mini index buffer into the larger mesh data. Meshlets can be compressed, streamed, culled, and batched much more efficiently than monolithic meshes. ![image](https://github.com/bevyengine/bevy/assets/47158642/cb2aaad0-7a9a-4e14-93b0-15d4e895b26a) ![image](https://github.com/bevyengine/bevy/assets/47158642/7534035b-1eb7-4278-9b99-5322e4401715) # Misc * Future work: https://github.com/bevyengine/bevy/issues/11518 * Nanite reference: https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf Two pass occlusion culling explained very well: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501 --------- Co-authored-by: Ricky Taylor <rickytaylor26@gmail.com> Co-authored-by: vero <email@atlasdostal.com> Co-authored-by: François <mockersf@gmail.com> Co-authored-by: atlas dostal <rodol@rivalrebels.com>	2024-03-25 19:08:27 +00:00

25 Commits