Commit Graph

25 Commits

Author SHA1 Message Date
Zachary Harrold
9bc0ae33c3
Move hashbrown and foldhash out of bevy_utils (#17460)
# Objective

- Contributes to #16877

## Solution

- Moved `hashbrown`, `foldhash`, and related types out of `bevy_utils`
and into `bevy_platform_support`
- Refactored the above to match the layout of these types in `std`.
- Updated crates as required.

## Testing

- CI

---

## Migration Guide

- The following items were moved out of `bevy_utils` and into
`bevy_platform_support::hash`:
  - `FixedState`
  - `DefaultHasher`
  - `RandomState`
  - `FixedHasher`
  - `Hashed`
  - `PassHash`
  - `PassHasher`
  - `NoOpHash`
- The following items were moved out of `bevy_utils` and into
`bevy_platform_support::collections`:
  - `HashMap`
  - `HashSet`
- `bevy_utils::hashbrown` has been removed. Instead, import from
`bevy_platform_support::collections` _or_ take a dependency on
`hashbrown` directly.
- `bevy_utils::Entry` has been removed. Instead, import from
`bevy_platform_support::collections::hash_map` or
`bevy_platform_support::collections::hash_set` as appropriate.
- All of the above equally apply to `bevy::utils` and
`bevy::platform_support`.

## Notes

- I left `PreHashMap`, `PreHashMapExt`, and `TypeIdMap` in `bevy_utils`
as they might be candidates for micro-crating. They can always be moved
into `bevy_platform_support` at a later date if desired.
2025-01-23 16:46:08 +00:00
MichiRecRoom
3742e621ef
Allow clippy::too_many_arguments to lint without warnings (#17249)
# Objective
Many instances of `clippy::too_many_arguments` linting happen to be on
systems - functions which we don't call manually, and thus there's not
much reason to worry about the argument count.

## Solution
Allow `clippy::too_many_arguments` globally, and remove all lint
attributes related to it.
2025-01-09 07:26:15 +00:00
JMS55
fe58993577
METIS-based meshlet generation (#16947)
# Objective
Improve DAG building for virtual geometry

## Solution

- Use METIS to group triangles into meshlets which lets us minimize
locked vertices which improves simplification, instead of using meshopt
which prioritizes culling efficiency. Also some other minor tweaks.
- Currently most meshlets have 126 triangles, and not 128. Fixing this
might involve calling METIS recursively ourselves to manually bisect the
graph, not sure. Not going to attempt to fix this in this PR.

## Testing

- Did you test these changes? If so, how?
  - Tested on bunny.glb and cliff.glb
- Are there any parts that need more testing?
  - No
- How can other people (reviewers) test your changes? Is there anything
specific they need to know?
  - Download the new bunny asset, run the meshlet example.

---

## Showcase

New 

![image](https://github.com/user-attachments/assets/68f5d2f0-a4ca-41e1-90d5-35a2c6969c21)

Old

![image](https://github.com/user-attachments/assets/a3d97a09-773d-44b2-9990-25e1f6b51ec9)

---------

Co-authored-by: IceSentry <IceSentry@users.noreply.github.com>
2025-01-05 02:03:26 +00:00
JMS55
b7ee23a59e
Remove meshlet builder retry queue (#16941)
Revert the retry queue for stuck meshlet groups that couldn't simplify
added in https://github.com/bevyengine/bevy/pull/15886.

It was a hack that didn't really work, that was intended to help solve
meshlets getting stuck and never getting simplified further. The actual
solution is a new DAG building algorithm that I have coming in a
followup PR. With that PR, there will be no need for the retry queue, as
meshlets will rarely ever get stuck (I checked, the code never gets
called). I split this off into it's own PR for easier reviewing.

Meshlet IDs during building are back to being relative to the overall
list of meshlets across all LODs, instead of starting at 0 for the first
meshlet in the simplification queue for the current LOD, regardless of
how many meshlets there are in the asset total.

Not going to bother to regenerate the bunny asset for this PR.
2024-12-23 22:16:06 +00:00
Clar Fon
711246aa34
Update hashbrown to 0.15 (#15801)
Updating dependencies; adopted version of #15696. (Supercedes #15696.)

Long answer: hashbrown is no longer using ahash by default, meaning that
we can't use the default-hasher methods with ahasher. So, we have to use
the longer-winded versions instead. This takes the opportunity to also
switch our default hasher as well, but without actually enabling the
default-hasher feature for hashbrown, meaning that we'll be able to
change our hasher more easily at the cost of all of these method calls
being obnoxious forever.

One large change from 0.15 is that `insert_unique_unchecked` is now
`unsafe`, and for cases where unsafe code was denied at the crate level,
I replaced it with `insert`.

## Migration Guide

`bevy_utils` has updated its version of `hashbrown` to 0.15 and now
defaults to `foldhash` instead of `ahash`. This means that if you've
hard-coded your hasher to `bevy_utils::AHasher` or separately used the
`ahash` crate in your code, you may need to switch to `foldhash` to
ensure that everything works like it does in Bevy.
2024-12-10 19:45:50 +00:00
Zachary Harrold
a6adced9ed
Deny derive_more error feature and replace it with thiserror (#16684)
# Objective

- Remove `derive_more`'s error derivation and replace it with
`thiserror`

## Solution

- Added `derive_more`'s `error` feature to `deny.toml` to prevent it
sneaking back in.
- Reverted to `thiserror` error derivation

## Notes

Merge conflicts were too numerous to revert the individual changes, so
this reversion was done manually. Please scrutinise carefully during
review.
2024-12-06 17:03:55 +00:00
JMS55
267b57e565
Meshlet normal-aware LOD and meshoptimizer upgrade (#16111)
# Objective

- Choose LOD based on normal simplification error in addition to
position error
- Update meshoptimizer to 0.22, which has a bunch of simplifier
improvements

## Testing

- Did you test these changes? If so, how?
- Visualize normals, and compare LOD changes before and after. Normals
no longer visibly change as the LOD cut changes.
- Are there any parts that need more testing?
  - No
- How can other people (reviewers) test your changes? Is there anything
specific they need to know?
- Run the meshlet example in this PR and on main and move around to
change the LOD cut. Before running each example, in
meshlet_mesh_material.wgsl, replace `let color = vec3(rand_f(&rng),
rand_f(&rng), rand_f(&rng));` with `let color =
(vertex_output.world_normal + 1.0) / 2.0;`. Make sure to download the
appropriate bunny asset for each branch!
2024-11-04 15:20:22 +00:00
JMS55
6d42830c7f
Meshlet builder improvements redux (#15886)
Take a bunch more improvements from @zeux's nanite.cpp code.

* Use position-only vertices (discard other attributes) to determine
meshlet connectivity for grouping
* Rather than using the lock borders flag when simplifying meshlet
groups, provide the locked vertices ourselves. The lock borders flag
locks the entire border of the meshlet group, but really we only want to
lock the edges between meshlet groups - outwards facing edges are fine
to unlock. This gives a really significant increase to the DAG quality.
* Add back stuck meshlets (group has only a single meshlet,
simplification failed) to the simplification queue to allow them to get
used later on and have another attempt at simplifying
* Target 8 meshlets per group instead of 4 (second biggest improvement
after manual locks)
* Provide a seed to metis for deterministic meshlet building
* Misc other improvements

We can remove the usage of unsafe after the next upstream meshopt
release, but for now we need to use the ffi function directly. I'll do
another round of improvements later, mainly attribute-aware
simplification and using spatial weights for meshlet grouping.

Need to merge https://github.com/bevyengine/bevy/pull/15846 first.
2024-10-23 16:56:50 +00:00
JMS55
9d54fe0370
Meshlet new error projection (#15846)
* New error projection code taken from @zeux's meshoptimizer nanite.cpp
demo for determining LOD (thanks zeux!)
* Builder: `compute_lod_group_data()`
* Runtime: `lod_error_is_imperceptible()`
2024-10-22 20:14:30 +00:00
Rob Parrett
da5d2fccf5
Fix some duplicate words in docs/comments (#15980)
# Objective

Stumbled upon one of these, and set off in search of more, armed with my
trusty `\b(\w+)\s+\1\b`.

## Solution

Remove ~one~ one of them.
2024-10-20 01:03:27 +00:00
Zachary Harrold
46ad0b7513
Remove thiserror from bevy_pbr (#15767)
# Objective

- Contributes to #15460

## Solution

- Removed `thiserror` from `bevy_pbr`
2024-10-09 14:25:16 +00:00
JMS55
aa626e4f0b
Per-meshlet compressed vertex data (#15643)
# Objective
- Prepare for streaming by storing vertex data per-meshlet, rather than
per-mesh (this means duplicating vertices per-meshlet)
- Compress vertex data to reduce the cost of this

## Solution
The important parts are in from_mesh.rs, the changes to the Meshlet type
in asset.rs, and the changes in meshlet_bindings.wgsl. Everything else
is pretty secondary/boilerplate/straightforward changes.

- Positions are quantized in centimeters with a user-provided power of 2
factor (ideally auto-determined, but that's a TODO for the future),
encoded as an offset relative to the minimum value within the meshlet,
and then stored as a packed list of bits using the minimum number of
bits needed for each vertex position channel for that meshlet
- E.g. quantize positions (lossly, throws away precision that's not
needed leading to using less bits in the bitstream encoding)
- Get the min/max quantized value of each X/Y/Z channel of the quantized
positions within a meshlet
- Encode values relative to the min value of the meshlet. E.g. convert
from [min, max] to [0, max - min]
- The new max value in the meshlet is (max - min), which only takes N
bits, so we only need N bits to store each channel within the meshlet
(lossless)
- We can store the min value and that it takes N bits per channel in the
meshlet metadata, and reconstruct the position from the bitstream
- Normals are octahedral encoded and than snorm2x16 packed and stored as
a single u32.
- Would be better to implement the precise variant of octhedral encoding
for extra precision (no extra decode cost), but decided to keep it
simple for now and leave that as a followup
- Tried doing a quantizing and bitstream encoding scheme like I did for
positions, but struggled to get it smaller. Decided to go with this for
simplicity for now
- UVs are uncompressed and take a full 64bits per vertex which is
expensive
  - In the future this should be improved
- Tangents, as of the previous PR, are not explicitly stored and are
instead derived from screen space gradients
- While I'm here, split up MeshletMeshSaverLoader into two separate
types

Other future changes include implementing a smaller encoding of triangle
data (3 u8 indices = 24 bits per triangle currently), and more
disk-oriented compression schemes.

References:
* "A Deep Dive into UE5's Nanite Virtualized Geometry"
https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf#page=128
(also available on youtube)
* "Towards Practical Meshlet Compression"
https://arxiv.org/pdf/2404.06359
* "Vertex quantization in Omniforce Game Engine"
https://daniilvinn.github.io/2024/05/04/omniforce-vertex-quantization.html

## Testing

- Did you test these changes? If so, how?
- Converted the stanford bunny, and rendered it with a debug material
showing normals, and confirmed that it's identical to what's on main.
EDIT: See additional testing in the comments below.
- Are there any parts that need more testing?
- Could use some more size comparisons on various meshes, and testing
different quantization factors. Not sure if 4 is a good default. EDIT:
See additional testing in the comments below.
- Also did not test runtime performance of the shaders. EDIT: See
additional testing in the comments below.
- How can other people (reviewers) test your changes? Is there anything
specific they need to know?
- Use my unholy script, replacing the meshlet example
https://paste.rs/7xQHk.rs (must make MeshletMesh fields pub instead of
pub crate, must add lz4_flex as a dev-dependency) (must compile with
meshlet and meshlet_processor features, mesh must have only positions,
normals, and UVs, no vertex colors or tangents)

---

## Migration Guide
- TBD by JMS55 at the end of the release
2024-10-08 18:42:55 +00:00
vero
6465e3bd9f
Fix Mesh allocator bug and reduce Mesh data copies by two (#15566)
# Objective

- First step towards #15558

## Solution

- Rename `get_vertex_buffer_data` to `create_packed_vertex_buffer_data`
to make it clear that it is not "free" and actually allocates
- Compute length analytically for preallocation instead of creating the
buffer to get its length and immediately discard it
- Use existing vertex attribute size calculation method to reduce code
duplication
- Fix a bug where mesh index data was being replaced by unnecessarily
newly created mesh vertex data in some cases
- Overall reduces mesh copies by two. We still have plenty to go, but
these were the easy ones.

## Testing

- I ran 3d_scene, lighting, and many_cubes, they look fine.
- Benchmarks would be nice, but this is very obviously a win in perf and
correctness.

---

## Migration Guide

- `Mesh::create_packed_vertex_buffer_data` has been renamed
`Mesh::create_packed_vertex_buffer_data` to reflect the fact that it
copies data and allocates.

## Showcase

- look mom, less copies
2024-10-01 17:15:57 +00:00
JMS55
9cc7e7c080
Meshlet screenspace-derived tangents (#15084)
* Save 16 bytes per vertex by calculating tangents in the shader at
runtime, rather than storing them in the vertex data.
* Based on https://jcgt.org/published/0009/03/04,
https://www.jeremyong.com/graphics/2023/12/16/surface-gradient-bump-mapping.
* Fixed visbuffer resolve to use the updated algorithm that flips ddy
correctly
* Added some more docs about meshlet material limitations, and some
TODOs about transforming UV coordinates for the future.


![image](https://github.com/user-attachments/assets/222d8192-8c82-4d77-945d-53670a503761)

For testing add a normal map to the bunnies with StandardMaterial like
below, and then test that on both main and this PR (make sure to
download the correct bunny for each). Results should be mostly
identical.

```rust
normal_map_texture: Some(asset_server.load_with_settings(
    "textures/BlueNoise-Normal.png",
    |settings: &mut ImageLoaderSettings| settings.is_srgb = false,
)),
```
2024-09-29 18:39:25 +00:00
Zachary Harrold
d70595b667
Add core and alloc over std Lints (#15281)
# Objective

- Fixes #6370
- Closes #6581

## Solution

- Added the following lints to the workspace:
  - `std_instead_of_core`
  - `std_instead_of_alloc`
  - `alloc_instead_of_core`
- Used `cargo +nightly fmt` with [item level use
formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Item%5C%3A)
to split all `use` statements into single items.
- Used `cargo clippy --workspace --all-targets --all-features --fix
--allow-dirty` to _attempt_ to resolve the new linting issues, and
intervened where the lint was unable to resolve the issue automatically
(usually due to needing an `extern crate alloc;` statement in a crate
root).
- Manually removed certain uses of `std` where negative feature gating
prevented `--all-features` from finding the offending uses.
- Used `cargo +nightly fmt` with [crate level use
formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Crate%5C%3A)
to re-merge all `use` statements matching Bevy's previous styling.
- Manually fixed cases where the `fmt` tool could not re-merge `use`
statements due to conditional compilation attributes.

## Testing

- Ran CI locally

## Migration Guide

The MSRV is now 1.81. Please update to this version or higher.

## Notes

- This is a _massive_ change to try and push through, which is why I've
outlined the semi-automatic steps I used to create this PR, in case this
fails and someone else tries again in the future.
- Making this change has no impact on user code, but does mean Bevy
contributors will be warned to use `core` and `alloc` instead of `std`
where possible.
- This lint is a critical first step towards investigating `no_std`
options for Bevy.

---------

Co-authored-by: François Mockers <francois.mockers@vleue.com>
2024-09-27 00:59:59 +00:00
Clar Fon
efda7f3f9c
Simpler lint fixes: makes ci lints work but disables a lint for now (#15376)
Takes the first two commits from #15375 and adds suggestions from this
comment:
https://github.com/bevyengine/bevy/pull/15375#issuecomment-2366968300

See #15375 for more reasoning/motivation.

## Rebasing (rerunning)

```rust
git switch simpler-lint-fixes
git reset --hard main
cargo fmt --all -- --unstable-features --config normalize_comments=true,imports_granularity=Crate
cargo fmt --all
git add --update
git commit --message "rustfmt"
cargo clippy --workspace --all-targets --all-features --fix
cargo fmt --all -- --unstable-features --config normalize_comments=true,imports_granularity=Crate
cargo fmt --all
git add --update
git commit --message "clippy"
git cherry-pick e6c0b94f6795222310fb812fa5c4512661fc7887
```
2024-09-24 11:42:59 +00:00
JMS55
a0faf9cd01
More triangles/vertices per meshlet (#15023)
### Builder changes
- Increased meshlet max vertices/triangles from 64v/64t to 255v/128t
(meshoptimizer won't allow 256v sadly). This gives us a much greater
percentage of meshlets with max triangle count (128). Still not perfect,
we still end up with some tiny <=10 triangle meshlets that never really
get simplified, but it's progress.
- Removed the error target limit. Now we allow meshoptimizer to simplify
as much as possible. No reason to cap this out, as the cluster culling
code will choose a good LOD level anyways. Again leads to higher quality
LOD trees.
- After some discussion and consulting the Nanite slides again, changed
meshlet group error from _adding_ the max child's error to the group
error, to doing `group_error = max(group_error, max_child_error)`. Error
is already cumulative between LODs as the edges we're collapsing during
simplification get longer each time.
- Bumped the 65% simplification threshold to allow up to 95% of the
original geometry (e.g. accept simplification as valid even if we only
simplified 5% of the triangles). This gives us closer to
log2(initial_meshlet_count) LOD levels, and fewer meshlet roots in the
DAG.

Still more work to be done in the future here. Maybe trying METIS for
meshlet building instead of meshoptimizer.

Using ~8 clusters per group instead of ~4 might also make a big
difference. The Nanite slides say that they have 8-32 meshlets per
group, suggesting some kind of heuristic. Unfortunately meshopt's
compute_cluster_bounds won't work with large groups atm
(https://github.com/zeux/meshoptimizer/discussions/750#discussioncomment-10562641)
so hard to test.

Based on discussion from
https://github.com/bevyengine/bevy/discussions/14998,
https://github.com/zeux/meshoptimizer/discussions/750, and discord.

### Runtime changes
- cluster:triangle packed IDs are now stored 25:7 instead of 26:6 bits,
as max triangles per cluster are now 128 instead of 64
- Hardware raster now spawns 128 * 3 vertices instead of 64 * 3 vertices
to account for the new max triangles limit
- Hardware raster now outputs NaN triangles (0 / 0) instead of
zero-positioned triangles for extra vertex invocations over the cluster
triangle count. Shouldn't really be a difference idt, but I did it
anyways.
- Software raster now does 128 threads per workgroup instead of 64
threads. Each thread now loads, projects, and caches a vertex (vertices
0-127), and then if needed does so again (vertices 128-254). Each thread
then rasterizes one of 128 triangles.
- Fixed a bug with `needs_dispatch_remap`. I had the condition backwards
in my last PR, I probably committed it by accident after testing the
non-default code path on my GPU.
2024-09-08 17:55:57 +00:00
JMS55
6cc96f4c1f
Meshlet software raster + start of cleanup (#14623)
# Objective
- Faster meshlet rasterization path for small triangles
- Avoid having to allocate and write out a triangle buffer
- Refactor gpu_scene.rs

## Solution
- Replace the 32bit visbuffer texture with a 64bit visbuffer buffer,
where the left 32 bits encode depth, and the right 32 bits encode the
existing cluster + triangle IDs. Can't use 64bit textures, wgpu/naga
doesn't support atomic ops on textures yet.
- Instead of writing out a buffer of packed cluster + triangle IDs (per
triangle) to raster, the culling pass now writes out a buffer of just
cluster IDs (per cluster, so less memory allocated, cheaper to write
out).
  - Clusters for software raster are allocated from the left side
- Clusters for hardware raster are allocated in the same buffer, from
the right side
- The buffer size is fixed at MeshletPlugin build time, and should be
set to a reasonable value for your scene (no warning on overflow, and no
good way to determine what value you need outside of renderdoc - I plan
to fix this in a future PR adding a meshlet stats overlay)
- Currently I don't have a heuristic for software vs hardware raster
selection for each cluster. The existing code is just a placeholder. I
need to profile on a release scene and come up with a heuristic,
probably in a future PR.
- The culling shader is getting pretty hard to follow at this point, but
I don't want to spend time improving it as the entire shader/pass is
getting rewritten/replaced in the near future.
- Software raster is a compute workgroup per-cluster. Each workgroup
loads and transforms the <=64 vertices of the cluster, and then
rasterizes the <=64 triangles of the cluster.
- Two variants are implemented: Scanline for clusters with any larger
triangles (still smaller than hardware is good at), and brute-force for
very very tiny triangles
- Once the shader determines that a pixel should be filled in, it does
an atomicMax() on the visbuffer to store the results, copying how Nanite
works
- On devices with a low max workgroups per dispatch limit, an extra
compute pass is inserted before software raster to convert from a 1d to
2d dispatch (I don't think 3d would ever be necessary).
- I haven't implemented the top-left rule or subpixel precision yet, I'm
leaving that for a future PR since I get usable results without it for
now
- Resources used:
https://kristoffer-dyrkorn.github.io/triangle-rasterizer and chapters
6-8 of
https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index
- Hardware raster now spawns 64*3 vertex invocations per meshlet,
instead of the actual meshlet vertex count. Extra invocations just
early-exit.
- While this is slower than the existing system, hardware draws should
be rare now that software raster is usable, and it saves a ton of memory
using the unified cluster ID buffer. This would be fixed if wgpu had
support for mesh shaders.
- Instead of writing to a color+depth attachment, the hardware raster
pass also does the same atomic visbuffer writes that software raster
uses.
- We have to bind a dummy render target anyways, as wgpu doesn't
currently support render passes without any attachments
- Material IDs are no longer written out during the main rasterization
passes.
- If we had async compute queues, we could overlap the software and
hardware raster passes.
- New material and depth resolve passes run at the end of the visbuffer
node, and write out view depth and material ID depth textures

### Misc changes
- Fixed cluster culling importing, but never actually using the previous
view uniforms when doing occlusion culling
- Fixed incorrectly adding the LOD error twice when building the meshlet
mesh
- Splitup gpu_scene module into meshlet_mesh_manager, instance_manager,
and resource_manager
- resource_manager is still too complex and inefficient (extract and
prepare are way too expensive). I plan on improving this in a future PR,
but for now ResourceManager is mostly a 1:1 port of the leftover
MeshletGpuScene bits.
- Material draw passes have been renamed to the more accurate material
shade pass, as well as some other misc renaming (in the future, these
will be compute shaders even, and not actual draw calls)

---

## Migration Guide
- TBD (ask me at the end of the release for meshlet changes as a whole)

---------

Co-authored-by: vero <email@atlasdostal.com>
2024-08-26 17:54:34 +00:00
Sarthak Singh
2c4ef37b76
Changed Mesh::attributes* functions to return MeshVertexAttribute (#14394)
# Objective

Fixes #14365 

## Migration Guide

- When using the iterator returned by `Mesh::attributes` or
`Mesh::attributes_mut` the first value of the tuple is not the
`MeshVertexAttribute` instead of `MeshVertexAttributeId`. To access the
`MeshVertexAttributeId` use the `MeshVertexAttribute.id` field.

Signed-off-by: Sarthak Singh <sarthak.singh99@gmail.com>
2024-08-12 15:54:28 +00:00
JMS55
6e8d43a037
Faster MeshletMesh deserialization (#14193)
# Objective
- Using bincode to deserialize binary into a MeshletMesh is expensive
(~77ms for a 5mb file).

## Solution
- Write a custom deserializer using bytemuck's Pod types and slice
casting.
  - Total asset load time has gone from ~102ms to ~12ms.
- Change some types I never meant to be public to private and other misc
cleanup.

## Testing
- Ran the meshlet example and added timing spans to the asset loader.

---

## Changelog
- Improved `MeshletMesh` loading speed
- The `MeshletMesh` disk format has changed, and
`MESHLET_MESH_ASSET_VERSION` has been bumped
- `MeshletMesh` fields are now private
- Renamed `MeshletMeshSaverLoad` to `MeshletMeshSaverLoader`
- The `Meshlet`, `MeshletBoundingSpheres`, and `MeshletBoundingSphere`
types are now private
- Removed `MeshletMeshSaveOrLoadError::SerializationOrDeserialization`
- Added `MeshletMeshSaveOrLoadError::WrongFileType`

## Migration Guide
- Regenerate your `MeshletMesh` assets, as the disk format has changed,
and `MESHLET_MESH_ASSET_VERSION` has been bumped
- `MeshletMesh` fields are now private
- `MeshletMeshSaverLoad` is now named `MeshletMeshSaverLoader`
- The `Meshlet`, `MeshletBoundingSpheres`, and `MeshletBoundingSphere`
types are now private
- `MeshletMeshSaveOrLoadError::SerializationOrDeserialization` has been
removed
- Added `MeshletMeshSaveOrLoadError::WrongFileType`, match on this
variant if you match on `MeshletMeshSaveOrLoadError`
2024-07-15 15:06:02 +00:00
Arseny Kapoulkine
4cd188568a
Improve MeshletMesh::from_mesh performance further (#14038)
This change updates meshopt-rs to 0.3 to take advantage of the newly
added sparse simplification mode: by default, simplifier assumes that
the entire mesh is simplified and runs a set of calculations that are
O(vertex count), but in our case we simplify many small mesh subsets
which is inefficient.

Sparse mode instead assumes that the simplified subset is only using a
portion of the vertex buffer, and optimizes accordingly. This changes
the meaning of the error (as it becomes relative to the subset, in our
case a meshlet group); to ensure consistent error selection, we also use
the ErrorAbsolute mode which allows us to operate in mesh coordinate
space.

Additionally, meshopt 0.3 runs optimizeMeshlet automatically as part of
`build_meshlets` so we no longer need to call it ourselves.

This reduces the time to build meshlet representation for Stanford Bunny
mesh from ~1.65s to ~0.45s (3.7x) in optimized builds.
2024-06-27 00:06:22 +00:00
Arseny Kapoulkine
6eec73a9a5
Make meshlet processing deterministic (#13913)
This is a followup to https://github.com/bevyengine/bevy/pull/13904
based on the discussion there, and switches two HashMaps that used
meshlet ids as keys to Vec.

In addition to a small further performance boost for `from_mesh` (1.66s
=> 1.60s), this makes processing deterministic modulo threading issues
wrt CRT rand described in the linked PR. This is valuable for debugging,
as you can visually or programmatically inspect the meshlet distribution
before/after making changes that should not change the output, whereas
previously every asset rebuild would change the meshlet structure.

Tested with https://github.com/bevyengine/bevy/pull/13431; after this
change, the visual output of meshlets is consistent between asset
rebuilds, and the MD5 of the output GLB file does not change either,
which was not the case before.
2024-06-20 00:58:43 +00:00
Arseny Kapoulkine
001cc147c6
Improve MeshletMesh::from_mesh performance (#13904)
This change reworks `find_connected_meshlets` to scale more linearly
with the mesh size, which significantly reduces the cost of building
meshlet representations. As a small extra complexity reduction, it moves
`simplify_scale` call out of the loop so that it's called once (it only
depends on the vertex data => is safe to cache).

The new implementation of connectivity analysis builds edge=>meshlet
list data structure, which allows us to only iterate through
`tuple_combinations` of a (usually) small list. There is still some
redundancy as if two meshlets share two edges, they will be represented
in the meshlet lists twice, but it's overall much faster.

Since the hash traversal is non-deterministic, to keep this part of the
algorithm deterministic for reproducible results we sort the output
adjacency lists.

Overall this reduces the time to process bunny mesh from ~4.2s to ~1.7s
when using release; in unoptimized builds the delta is even more
significant.

This was tested by using https://github.com/bevyengine/bevy/pull/13431
and:

a) comparing the result of `find_connected_meshlets` using old and new
code; they are equal in all steps of the clustering process
b) comparing the rendered result of the old code vs new code *after*
making the rest of the algorithm deterministic: right now the loop that
iterates through the result of `group_meshlets()` call executes in
different order between program runs. This is orthogonal to this change
and can be fixed separately.

Note: a future change can shrink the processing time further from ~1.7s
to ~0.4s with a small diff but that requires an update to meshopt crate
which is pending in https://github.com/gwihlidal/meshopt-rs/pull/42.
This change is independent.
2024-06-18 08:29:17 +00:00
JMS55
6d6810c90d
Meshlet continuous LOD (#12755)
Adds a basic level of detail system to meshlets. An extremely brief
summary is as follows:
* In `from_mesh.rs`, once we've built the first level of clusters, we
group clusters, simplify the new mega-clusters, and then split the
simplified groups back into regular sized clusters. Repeat several times
(ideally until you can't anymore). This forms a directed acyclic graph
(DAG), where the children are the meshlets from the previous level, and
the parents are the more simplified versions of their children. The leaf
nodes are meshlets formed from the original mesh.
* In `cull_meshlets.wgsl`, each cluster selects whether to render or not
based on the LOD bounding sphere (different than the culling bounding
sphere) of the current meshlet, the LOD bounding sphere of its parent
(the meshlet group from simplification), and the simplification error
relative to its children of both the current meshlet and its parent
meshlet. This kind of breaks two pass occlusion culling, which will be
fixed in a future PR by using an HZB from the previous frame to get the
initial list of occluders.

Many, _many_ improvements to be done in the future
https://github.com/bevyengine/bevy/issues/11518, not least of which is
code quality and speed. I don't even expect this to work on many types
of input meshes. This is just a basic implementation/draft for
collaboration.

Arguable how much we want to do in this PR, I'll leave that up to
maintainers. I've erred on the side of "as basic as possible".

References:
* Slides 27-77 (video available on youtube)
https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf
*
https://blog.traverseresearch.nl/creating-a-directed-acyclic-graph-from-a-mesh-1329e57286e5
*
https://jglrxavpok.github.io/2024/01/19/recreating-nanite-lod-generation.html,
https://jglrxavpok.github.io/2024/03/12/recreating-nanite-faster-lod-generation.html,
https://jglrxavpok.github.io/2024/04/02/recreating-nanite-runtime-lod-selection.html,
and https://github.com/jglrxavpok/Carrot
*
https://github.com/gents83/INOX/tree/master/crates/plugins/binarizer/src
* https://cs418.cs.illinois.edu/website/text/nanite.html


![image](https://github.com/bevyengine/bevy/assets/47158642/e40bff9b-7d0c-4a19-a3cc-2aad24965977)

![image](https://github.com/bevyengine/bevy/assets/47158642/442c7da3-7761-4da7-9acd-37f15dd13e26)

---------

Co-authored-by: Ricky Taylor <rickytaylor26@gmail.com>
Co-authored-by: vero <email@atlasdostal.com>
Co-authored-by: François <mockersf@gmail.com>
Co-authored-by: atlas dostal <rodol@rivalrebels.com>
Co-authored-by: Patrick Walton <pcwalton@mimiga.net>
2024-04-23 21:43:53 +00:00
JMS55
4f20faaa43
Meshlet rendering (initial feature) (#10164)
# Objective
- Implements a more efficient, GPU-driven
(https://github.com/bevyengine/bevy/issues/1342) rendering pipeline
based on meshlets.
- Meshes are split into small clusters of triangles called meshlets,
each of which acts as a mini index buffer into the larger mesh data.
Meshlets can be compressed, streamed, culled, and batched much more
efficiently than monolithic meshes.


![image](https://github.com/bevyengine/bevy/assets/47158642/cb2aaad0-7a9a-4e14-93b0-15d4e895b26a)

![image](https://github.com/bevyengine/bevy/assets/47158642/7534035b-1eb7-4278-9b99-5322e4401715)

# Misc
* Future work: https://github.com/bevyengine/bevy/issues/11518
* Nanite reference:
https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf
Two pass occlusion culling explained very well:
https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501

---------

Co-authored-by: Ricky Taylor <rickytaylor26@gmail.com>
Co-authored-by: vero <email@atlasdostal.com>
Co-authored-by: François <mockersf@gmail.com>
Co-authored-by: atlas dostal <rodol@rivalrebels.com>
2024-03-25 19:08:27 +00:00