forestia/bevy

Author	SHA1	Message	Date
newclarityex	ecccd57417	Generic system config (#17962 ) # Objective Prevents duplicate implementation between IntoSystemConfigs and IntoSystemSetConfigs using a generic, adds a NodeType trait for more config flexibility (opening the door to implement https://github.com/bevyengine/bevy/issues/14195?). ## Solution Followed writeup by @ItsDoot: https://hackmd.io/@doot/rJeefFHc1x Removes IntoSystemConfigs and IntoSystemSetConfigs, instead using IntoNodeConfigs with generics. ## Testing Pending --- ## Showcase N/A ## Migration Guide SystemSetConfigs -> NodeConfigs<InternedSystemSet> SystemConfigs -> NodeConfigs<ScheduleSystem> IntoSystemSetConfigs -> IntoNodeConfigs<InternedSystemSet, M> IntoSystemConfigs -> IntoNodeConfigs<ScheduleSystem, M> --------- Co-authored-by: Christian Hughes <9044780+ItsDoot@users.noreply.github.com> Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>	2025-03-12 00:12:30 +00:00
Carter Anderson	06cb5c5fd9	Fix Component require() IDE integration (#18165 ) # Objective Component `require()` IDE integration is fully broken, as of #16575. ## Solution This reverts us back to the previous "put the docs on Component trait" impl. This _does_ reduce the accessibility of the required components in rust docs, but the complete erasure of "required component IDE experience" is not worth the price of slightly increased prominence of requires in docs. Additionally, Rust Analyzer has recently started including derive attributes in suggestions, so we aren't losing that benefit of the proc_macro attribute impl.	2025-03-06 02:44:47 +00:00
Patrick Walton	4880a231de	Implement occlusion culling for directional light shadow maps. (#17951 ) Two-phase occlusion culling can be helpful for shadow maps just as it can for a prepass, in order to reduce vertex and alpha mask fragment shading overhead. This patch implements occlusion culling for shadow maps from directional lights, when the `OcclusionCulling` component is present on the entities containing the lights. Shadow maps from point lights are deferred to a follow-up patch. Much of this patch involves expanding the hierarchical Z-buffer to cover shadow maps in addition to standard view depth buffers. The `scene_viewer` example has been updated to add `OcclusionCulling` to the directional light that it creates. This improved the performance of the rend3 sci-fi test scene when enabling shadows.	2025-02-21 05:56:15 +00:00
Patrick Walton	8de6b16e9d	Implement occlusion culling for the deferred rendering pipeline. (#17934 ) Deferred rendering currently doesn't support occlusion culling. This PR implements it in a straightforward way, mirroring what we already do for the non-deferred pipeline. On the rend3 sci-fi base test scene, this resulted in roughly a 2× speedup when applied on top of my other patches. For that scene, it was useful to add another option, `--add-light`, which forces the addition of a shadow-casting light, to the scene viewer, which I included in this patch.	2025-02-20 12:54:27 +00:00
JMS55	2fd4cc4937	Meshlet texture atomics (#17765 ) * Use texture atomics rather than buffer atomics for the visbuffer (haven't tested perf on a raster-heavy scene yet) * Unfortunately to clear the visbuffer we now need a compute pass to clear it. Using wgpu's clear_texture function internally uses a buffer -> image copy that's insanely expensive. Ideally it should be using vkCmdClearColorImage, which I've opened an issue for https://github.com/gfx-rs/wgpu/issues/7090. For now we'll have to stick with a custom compute pass and all the extra code that brings. * Faster resolve depth pass by discarding 0 depth pixels instead of redundantly writing zero (2x faster for big depth textures like shadow views)	2025-02-12 18:15:43 +00:00
JMS55	669d139c13	Upgrade to wgpu v24 (#17542 ) Didn't remove WgpuWrapper. Not sure if it's needed or not still. ## Testing - Did you test these changes? If so, how? Example runner - Are there any parts that need more testing? Web (portable atomics thingy?), DXC. ## Migration Guide - Bevy has upgraded to [wgpu v24](https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md#v2400-2025-01-15). - When using the DirectX 12 rendering backend, the new priority system for choosing a shader compiler is as follows: - If the `WGPU_DX12_COMPILER` environment variable is set at runtime, it is used - Else if the new `statically-linked-dxc` feature is enabled, a custom version of DXC will be statically linked into your app at compile time. - Else Bevy will look in the app's working directory for `dxcompiler.dll` and `dxil.dll` at runtime. - Else if they are missing, Bevy will fall back to FXC (not recommended) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: IceSentry <c.giguere42@gmail.com> Co-authored-by: François Mockers <francois.mockers@vleue.com>	2025-02-09 19:40:53 +00:00
Sludge	989f547080	Weak handle migration (#17695 ) # Objective - Make use of the new `weak_handle!` macro added in https://github.com/bevyengine/bevy/pull/17384 ## Solution - Migrate bevy from `Handle::weak_from_u128` to the new `weak_handle!` macro that takes a random UUID - Deprecate `Handle::weak_from_u128`, since there are no remaining use cases that can't also be addressed by constructing the type manually ## Testing - `cargo run -p ci -- test` --- ## Migration Guide Replace `Handle::weak_from_u128` with `weak_handle!` and a random UUID.	2025-02-05 22:44:20 +00:00
Patrick Walton	dda97880c4	Implement experimental GPU two-phase occlusion culling for the standard 3D mesh pipeline. (#17413 ) Occlusion culling allows the GPU to skip the vertex and fragment shading overhead for objects that can be quickly proved to be invisible because they're behind other geometry. A depth prepass already eliminates most fragment shading overhead for occluded objects, but the vertex shading overhead, as well as the cost of testing and rejecting fragments against the Z-buffer, is presently unavoidable for standard meshes. We currently perform occlusion culling only for meshlets. But other meshes, such as skinned meshes, can benefit from occlusion culling too in order to avoid the transform and skinning overhead for unseen meshes. This commit adapts the same [two-phase occlusion culling] technique that meshlets use to Bevy's standard 3D mesh pipeline when the new `OcclusionCulling` component, as well as the `DepthPrepass` component, are present on the camera. It has these steps: 1. Early depth prepass: We use the hierarchical Z-buffer from the previous frame to cull meshes for the initial depth prepass, effectively rendering only the meshes that were visible in the last frame. 2. Early depth downsample: We downsample the depth buffer to create another hierarchical Z-buffer, this time with the current view transform. 3. Late depth prepass: We use the new hierarchical Z-buffer to test all meshes that weren't rendered in the early depth prepass. Any meshes that pass this check are rendered. 4. Late depth downsample: Again, we downsample the depth buffer to create a hierarchical Z-buffer in preparation for the early depth prepass of the next frame. This step is done after all the rendering, in order to account for custom phase items that might write to the depth buffer. Note that this patch has no effect on the per-mesh CPU overhead for occluded objects, which remains high for a GPU-driven renderer due to the lack of `cold-specialization` and retained bins. If `cold-specialization` and retained bins weren't on the horizon, then a more traditional approach like potentially visible sets (PVS) or low-res CPU rendering would probably be more efficient than the GPU-driven approach that this patch implements for most scenes. However, at this point the amount of effort required to implement a PVS baking tool or a low-res CPU renderer would probably be greater than landing `cold-specialization` and retained bins, and the GPU driven approach is the more modern one anyway. It does mean that the performance improvements from occlusion culling as implemented in this patch today are likely to be limited, because of the high CPU overhead for occluded meshes. Note also that this patch currently doesn't implement occlusion culling for 2D objects or shadow maps. Those can be addressed in a follow-up. Additionally, note that the techniques in this patch require compute shaders, which excludes support for WebGL 2. This PR is marked experimental because of known precision issues with the downsampling approach when applied to non-power-of-two framebuffer sizes (i.e. most of them). These precision issues can, in rare cases, cause objects to be judged occluded that in fact are not. (I've never seen this in practice, but I know it's possible; it tends to be likelier to happen with small meshes.) As a follow-up to this patch, we desire to switch to the [SPD-based hi-Z buffer shader from the Granite engine], which doesn't suffer from these problems, at which point we should be able to graduate this feature from experimental status. I opted not to include that rewrite in this patch for two reasons: (1) @JMS55 is planning on doing the rewrite to coincide with the new availability of image atomic operations in Naga; (2) to reduce the scope of this patch. A new example, `occlusion_culling`, has been added. It demonstrates objects becoming quickly occluded and disoccluded by dynamic geometry and shows the number of objects that are actually being rendered. Also, a new `--occlusion-culling` switch has been added to `scene_viewer`, in order to make it easy to test this patch with large scenes like Bistro. [two-phase occlusion culling]: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501 [Aaltonen SIGGRAPH 2015]: https://www.advances.realtimerendering.com/s2015/aaltonenhaar_siggraph2015_combined_final_footer_220dpi.pdf [Some literature]: https://gist.github.com/reduz/c5769d0e705d8ab7ac187d63be0099b5?permalink_comment_id=5040452#gistcomment-5040452 [SPD-based hi-Z buffer shader from the Granite engine]: https://github.com/Themaister/Granite/blob/master/assets/shaders/post/hiz.comp ## Migration guide * When enqueuing a custom mesh pipeline, work item buffers are now created with `bevy::render::batching::gpu_preprocessing::get_or_create_work_item_buffer`, not `PreprocessWorkItemBuffers::new`. See the `specialized_mesh_pipeline` example. ## Showcase Occlusion culling example: ![Screenshot 2025-01-15 175051](https://github.com/user-attachments/assets/1544f301-68a3-45f8-84a6-7af3ad431258) Bistro zoomed out, before occlusion culling: ![Screenshot 2025-01-16 185425](https://github.com/user-attachments/assets/5114bbdf-5dec-4de9-b17e-7aa77e7b61ed) Bistro zoomed out, after occlusion culling: ![Screenshot 2025-01-16 184949](https://github.com/user-attachments/assets/9dd67713-656c-4276-9768-6d261ca94300) In this scene, occlusion culling reduces the number of meshes Bevy has to render from 1591 to 585.	2025-01-27 05:02:46 +00:00
Zachary Harrold	a371ee3019	Remove `tracing` re-export from `bevy_utils` (#17161 ) # Objective - Contributes to #11478 ## Solution - Made `bevy_utils::tracing` `doc(hidden)` - Re-exported `tracing` from `bevy_log` for end-users - Added `tracing` directly to crates that need it. ## Testing - CI --- ## Migration Guide If you were importing `tracing` via `bevy::utils::tracing`, instead use `bevy::log::tracing`. Note that many items within `tracing` are also directly re-exported from `bevy::log` as well, so you may only need `bevy::log` for the most common items (e.g., `warn!`, `trace!`, etc.). This also applies to the `log_once!` family of macros. ## Notes - While this doesn't reduce the line-count in `bevy_utils`, it further decouples the internal crates from `bevy_utils`, making its eventual removal more feasible in the future. - I have just imported `tracing` as we do for all dependencies. However, a workspace dependency may be more appropriate for version management.	2025-01-05 23:06:34 +00:00
Benjamin Brienen	7112d5594e	Remove all `deprecated` code (#16338 ) # Objective Release cycle things ## Solution Delete items deprecated in 0.15 and migrate bevy itself. ## Testing CI	2025-01-05 20:33:39 +00:00
Patrick Walton	40df1ea4b6	Remove the type parameter from `check_visibility`, and only invoke it once. (#16812 ) Currently, `check_visibility` is parameterized over a query filter that specifies the type of potentially-visible object. This has the unfortunate side effect that we need a separate system, `mark_view_visibility_as_changed_if_necessary`, to trigger view visibility change detection. That system is quite slow because it must iterate sequentially over all entities in the scene. This PR moves the query filter from `check_visibility` to a new component, `VisibilityClass`. `VisibilityClass` stores a list of type IDs, each corresponding to one of the query filters we used to use. Because `check_visibility` is no longer specialized to the query filter at the type level, Bevy now only needs to invoke it once, leading to better performance as `check_visibility` can do change detection on the fly rather than delegating it to a separate system. This commit also has ergonomic improvements, as there's no need for applications that want to add their own custom renderable components to add specializations of the `check_visibility` system to the schedule. Instead, they only need to ensure that the `ViewVisibility` component is properly kept up to date. The recommended way to do this, and the way that's demonstrated in the `custom_phase_item` and `specialized_mesh_pipeline` examples, is to make `ViewVisibility` a required component and to add the type ID to it in a component add hook. This patch does this for `Mesh3d`, `Mesh2d`, `Sprite`, `Light`, and `Node`, which means that most app code doesn't need to change at all. Note that, although this patch has a large impact on the performance of visibility determination, it doesn't actually improve the end-to-end frame time of `many_cubes`. That's because the render world was already effectively hiding the latency from `mark_view_visibility_as_changed_if_necessary`. This patch is, however, necessary for further improvements to `many_cubes` performance. `many_cubes` trace before: ![Screenshot 2024-12-13 015318](https://github.com/user-attachments/assets/d0b1881b-fb75-4a39-b05d-1a16eabfa2c5) `many_cubes` trace after: ![Screenshot 2024-12-13 145735](https://github.com/user-attachments/assets/0a364289-e942-41bb-9cc2-b05d07e3722d) ## Migration Guide * `check_visibility` no longer takes a `QueryFilter`, and there's no need to add it manually to your app schedule anymore for custom rendering items. Instead, entities with custom renderable components should add the appropriate type IDs to `VisibilityClass`. See `custom_phase_item` for an example.	2024-12-17 04:43:45 +00:00
SpecificProtagonist	d92fc1e456	Move required components doc to type doc (#16575 ) # Objective Make documentation of a component's required components more visible by moving it to the type's docs ## Solution Change `#[require]` from a derive macro helper to an attribute macro. Disadvantages: - this silences any unused code warnings on the component, as it is used by the macro! - need to import `require` if not using the ecs prelude (I have not included this in the migration guilde as Rust tooling already suggests the fix) --- ## Showcase ![Documentation of Camera](https://github.com/user-attachments/assets/3329511b-747a-4c8d-a43e-57f7c9c71a3c) --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: JMS55 <47158642+JMS55@users.noreply.github.com>	2024-12-03 19:45:20 +00:00
JMS55	d221665386	Native unclipped depth on supported platforms (#16095 ) # Objective - Fixes #16078 ## Solution - Rename things to clarify that we _want_ unclipped depth for directional light shadow views, and need some way of disabling the GPU's builtin depth clipping - Use DEPTH_CLIP_CONTROL instead of the fragment shader emulation on supported platforms - Pass only the clip position depth instead of the whole clip position between vertex->fragment shader (no idea if this helps performance or not, compiler might optimize it anyways) - Meshlets - HW raster always uses DEPTH_CLIP_CONTROL since it targets a more limited set of platforms - SW raster was not handling DEPTH_CLAMP_ORTHO correctly, it ended up pretty much doing nothing. - This PR made me realize that SW raster technically should have depth clipping for all views that are not directional light shadows, but I decided not to bother writing it. I'm not sure that it ever matters in practice. If proven otherwise, I can add it. ## Testing - Did you test these changes? If so, how? - Lighting example. Both opaque (no fragment shader) and alpha masked geometry (fragment shader emulation) are working with depth_clip_control, and both work when it's turned off. Also tested meshlet example. - Are there any parts that need more testing? - Performance. I can't figure out a good test scene. - How can other people (reviewers) test your changes? Is there anything specific they need to know? - Toggle depth_clip_control_supported in prepass/mod.rs line 323 to turn this PR on or off. - If relevant, what platforms did you test these changes on, and are there any important ones you can't test? - Native --- ## Migration Guide - `MeshPipelineKey::DEPTH_CLAMP_ORTHO` is now `MeshPipelineKey::UNCLIPPED_DEPTH_ORTHO` - The `DEPTH_CLAMP_ORTHO` shaderdef has been renamed to `UNCLIPPED_DEPTH_ORTHO_EMULATION` - `clip_position_unclamped: vec4<f32>` is now `unclipped_depth: f32`	2024-12-03 17:30:14 +00:00
JMS55	3ec09582c6	Fix meshlet private item regression (#16404 ) I didn't mean to make this item private, fixing it for the 0.15 release to be consistent with 0.14. (maintainers: please make sure this gets merged into the 0.15 release branch as well as main)	2024-11-16 22:06:26 +00:00
JMS55	267b57e565	Meshlet normal-aware LOD and meshoptimizer upgrade (#16111 ) # Objective - Choose LOD based on normal simplification error in addition to position error - Update meshoptimizer to 0.22, which has a bunch of simplifier improvements ## Testing - Did you test these changes? If so, how? - Visualize normals, and compare LOD changes before and after. Normals no longer visibly change as the LOD cut changes. - Are there any parts that need more testing? - No - How can other people (reviewers) test your changes? Is there anything specific they need to know? - Run the meshlet example in this PR and on main and move around to change the LOD cut. Before running each example, in meshlet_mesh_material.wgsl, replace `let color = vec3(rand_f(&rng), rand_f(&rng), rand_f(&rng));` with `let color = (vertex_output.world_normal + 1.0) / 2.0;`. Make sure to download the appropriate bunny asset for each branch!	2024-11-04 15:20:22 +00:00
JMS55	3fb6cefb2f	Meshlet fill cluster buffers rewritten (#15955 ) # Objective - Make the meshlet fill cluster buffers pass slightly faster - Address https://github.com/bevyengine/bevy/issues/15920 for meshlets - Added PreviousGlobalTransform as a required meshlet component to avoid extra archetype moves, slightly alleviating https://github.com/bevyengine/bevy/issues/14681 for meshlets - Enforce that MeshletPlugin::cluster_buffer_slots is not greater than 2^25 (glitches will occur otherwise). Technically this field controls post-lod/culling cluster count, and the issue is on pre-lod/culling cluster count, but it's still valid now, and in the future this will be more true. Needs to be merged after https://github.com/bevyengine/bevy/pull/15846 and https://github.com/bevyengine/bevy/pull/15886 ## Solution - Old pass dispatched a thread per cluster, and did a binary search over the instances to find which instance the cluster belongs to, and what meshlet index within the instance it is. - New pass dispatches a workgroup per instance, and has the workgroup loop over all meshlets in the instance in order to write out the cluster data. - Use a push constant instead of arrayLength to fix the linked bug - Remap 1d->2d dispatch for software raster only if actually needed to save on spawning excess workgroups ## Testing - Did you test these changes? If so, how? - Ran the meshlet example, and an example with 1041 instances of 32217 meshlets per instance. Profiled the second scene with nsight, went from 0.55ms -> 0.40ms. Small savings. We're pretty much VRAM bandwidth bound at this point. - How can other people (reviewers) test your changes? Is there anything specific they need to know? - Run the meshlet example ## Changelog (non-meshlets) - PreviousGlobalTransform now implements the Default trait	2024-10-23 19:18:49 +00:00
JMS55	9d54fe0370	Meshlet new error projection (#15846 ) * New error projection code taken from @zeux's meshoptimizer nanite.cpp demo for determining LOD (thanks zeux!) * Builder: `compute_lod_group_data()` * Runtime: `lod_error_is_imperceptible()`	2024-10-22 20:14:30 +00:00
Tim	3da0ef048e	Remove the `Component` trait implementation from `Handle` (#15796 ) # Objective - Closes #15716 - Closes #15718 ## Solution - Replace `Handle<MeshletMesh>` with a new `MeshletMesh3d` component - As expected there were some random things that needed fixing: - A couple tests were storing handles just to prevent them from being dropped I believe, which seems to have been unnecessary in some. - The `SpriteBundle` still had a `Handle<Image>` field. I've removed this. - Tests in `bevy_sprite` incorrectly added a `Handle<Image>` field outside of the `Sprite` component. - A few examples were still inserting `Handle`s, switched those to their corresponding wrappers. - 2 examples that were still querying for `Handle<Image>` were changed to query `Sprite` ## Testing - I've verified that the changed example work now ## Migration Guide `Handle` can no longer be used as a `Component`. All existing Bevy types using this pattern have been wrapped in their own semantically meaningful type. You should do the same for any custom `Handle` components your project needs. The `Handle<MeshletMesh>` component is now `MeshletMesh3d`. The `WithMeshletMesh` type alias has been removed. Use `With<MeshletMesh3d>` instead.	2024-10-09 21:10:01 +00:00
Joona Aalto	a2b53d46e7	Fix meshlet materials (#15755 ) # Objective After #15524, there are these bunny-shaped holes in rendering in the meshlet example! ![broken](https://github.com/user-attachments/assets/9e9f20ec-b820-44df-b961-68a1dee44002) This is because (1) they're using a raw asset handle instead of `MeshMaterial3d`, and (2) the system that extracts mesh materials into the render world has an unnecessary `With<Mesh3d>` filter, which makes it not account for meshlets. ## Solution Remove the redundant filter and use `MeshMaterial3d`. The bunnies got some paint! ![fixed](https://github.com/user-attachments/assets/adb42556-fd4b-4000-8ca8-1356250dd532)	2024-10-09 15:39:10 +00:00
JMS55	aa626e4f0b	Per-meshlet compressed vertex data (#15643 ) # Objective - Prepare for streaming by storing vertex data per-meshlet, rather than per-mesh (this means duplicating vertices per-meshlet) - Compress vertex data to reduce the cost of this ## Solution The important parts are in from_mesh.rs, the changes to the Meshlet type in asset.rs, and the changes in meshlet_bindings.wgsl. Everything else is pretty secondary/boilerplate/straightforward changes. - Positions are quantized in centimeters with a user-provided power of 2 factor (ideally auto-determined, but that's a TODO for the future), encoded as an offset relative to the minimum value within the meshlet, and then stored as a packed list of bits using the minimum number of bits needed for each vertex position channel for that meshlet - E.g. quantize positions (lossly, throws away precision that's not needed leading to using less bits in the bitstream encoding) - Get the min/max quantized value of each X/Y/Z channel of the quantized positions within a meshlet - Encode values relative to the min value of the meshlet. E.g. convert from [min, max] to [0, max - min] - The new max value in the meshlet is (max - min), which only takes N bits, so we only need N bits to store each channel within the meshlet (lossless) - We can store the min value and that it takes N bits per channel in the meshlet metadata, and reconstruct the position from the bitstream - Normals are octahedral encoded and than snorm2x16 packed and stored as a single u32. - Would be better to implement the precise variant of octhedral encoding for extra precision (no extra decode cost), but decided to keep it simple for now and leave that as a followup - Tried doing a quantizing and bitstream encoding scheme like I did for positions, but struggled to get it smaller. Decided to go with this for simplicity for now - UVs are uncompressed and take a full 64bits per vertex which is expensive - In the future this should be improved - Tangents, as of the previous PR, are not explicitly stored and are instead derived from screen space gradients - While I'm here, split up MeshletMeshSaverLoader into two separate types Other future changes include implementing a smaller encoding of triangle data (3 u8 indices = 24 bits per triangle currently), and more disk-oriented compression schemes. References: * "A Deep Dive into UE5's Nanite Virtualized Geometry" https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf#page=128 (also available on youtube) * "Towards Practical Meshlet Compression" https://arxiv.org/pdf/2404.06359 * "Vertex quantization in Omniforce Game Engine" https://daniilvinn.github.io/2024/05/04/omniforce-vertex-quantization.html ## Testing - Did you test these changes? If so, how? - Converted the stanford bunny, and rendered it with a debug material showing normals, and confirmed that it's identical to what's on main. EDIT: See additional testing in the comments below. - Are there any parts that need more testing? - Could use some more size comparisons on various meshes, and testing different quantization factors. Not sure if 4 is a good default. EDIT: See additional testing in the comments below. - Also did not test runtime performance of the shaders. EDIT: See additional testing in the comments below. - How can other people (reviewers) test your changes? Is there anything specific they need to know? - Use my unholy script, replacing the meshlet example https://paste.rs/7xQHk.rs (must make MeshletMesh fields pub instead of pub crate, must add lz4_flex as a dev-dependency) (must compile with meshlet and meshlet_processor features, mesh must have only positions, normals, and UVs, no vertex colors or tangents) --- ## Migration Guide - TBD by JMS55 at the end of the release	2024-10-08 18:42:55 +00:00
JMS55	6cc96f4c1f	Meshlet software raster + start of cleanup (#14623 ) # Objective - Faster meshlet rasterization path for small triangles - Avoid having to allocate and write out a triangle buffer - Refactor gpu_scene.rs ## Solution - Replace the 32bit visbuffer texture with a 64bit visbuffer buffer, where the left 32 bits encode depth, and the right 32 bits encode the existing cluster + triangle IDs. Can't use 64bit textures, wgpu/naga doesn't support atomic ops on textures yet. - Instead of writing out a buffer of packed cluster + triangle IDs (per triangle) to raster, the culling pass now writes out a buffer of just cluster IDs (per cluster, so less memory allocated, cheaper to write out). - Clusters for software raster are allocated from the left side - Clusters for hardware raster are allocated in the same buffer, from the right side - The buffer size is fixed at MeshletPlugin build time, and should be set to a reasonable value for your scene (no warning on overflow, and no good way to determine what value you need outside of renderdoc - I plan to fix this in a future PR adding a meshlet stats overlay) - Currently I don't have a heuristic for software vs hardware raster selection for each cluster. The existing code is just a placeholder. I need to profile on a release scene and come up with a heuristic, probably in a future PR. - The culling shader is getting pretty hard to follow at this point, but I don't want to spend time improving it as the entire shader/pass is getting rewritten/replaced in the near future. - Software raster is a compute workgroup per-cluster. Each workgroup loads and transforms the <=64 vertices of the cluster, and then rasterizes the <=64 triangles of the cluster. - Two variants are implemented: Scanline for clusters with any larger triangles (still smaller than hardware is good at), and brute-force for very very tiny triangles - Once the shader determines that a pixel should be filled in, it does an atomicMax() on the visbuffer to store the results, copying how Nanite works - On devices with a low max workgroups per dispatch limit, an extra compute pass is inserted before software raster to convert from a 1d to 2d dispatch (I don't think 3d would ever be necessary). - I haven't implemented the top-left rule or subpixel precision yet, I'm leaving that for a future PR since I get usable results without it for now - Resources used: https://kristoffer-dyrkorn.github.io/triangle-rasterizer and chapters 6-8 of https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index - Hardware raster now spawns 64*3 vertex invocations per meshlet, instead of the actual meshlet vertex count. Extra invocations just early-exit. - While this is slower than the existing system, hardware draws should be rare now that software raster is usable, and it saves a ton of memory using the unified cluster ID buffer. This would be fixed if wgpu had support for mesh shaders. - Instead of writing to a color+depth attachment, the hardware raster pass also does the same atomic visbuffer writes that software raster uses. - We have to bind a dummy render target anyways, as wgpu doesn't currently support render passes without any attachments - Material IDs are no longer written out during the main rasterization passes. - If we had async compute queues, we could overlap the software and hardware raster passes. - New material and depth resolve passes run at the end of the visbuffer node, and write out view depth and material ID depth textures ### Misc changes - Fixed cluster culling importing, but never actually using the previous view uniforms when doing occlusion culling - Fixed incorrectly adding the LOD error twice when building the meshlet mesh - Splitup gpu_scene module into meshlet_mesh_manager, instance_manager, and resource_manager - resource_manager is still too complex and inefficient (extract and prepare are way too expensive). I plan on improving this in a future PR, but for now ResourceManager is mostly a 1:1 port of the leftover MeshletGpuScene bits. - Material draw passes have been renamed to the more accurate material shade pass, as well as some other misc renaming (in the future, these will be compute shaders even, and not actual draw calls) --- ## Migration Guide - TBD (ask me at the end of the release for meshlet changes as a whole) --------- Co-authored-by: vero <email@atlasdostal.com>	2024-08-26 17:54:34 +00:00
Jan Hohenheim	6f7c554daa	Fix common capitalization errors in documentation (#14562 ) WASM -> Wasm MacOS -> macOS Nothing important, just something that annoyed me for a while :)	2024-07-31 21:16:05 +00:00
charlotte	abaea01e30	Fixup Msaa docs. (#14442 ) Minor doc fixes missed in #14273	2024-07-22 21:37:25 +00:00
charlotte	03fd1b46ef	Move `Msaa` to component (#14273 ) Switches `Msaa` from being a globally configured resource to a per camera view component. Closes #7194 # Objective Allow individual views to describe their own MSAA settings. For example, when rendering to different windows or to different parts of the same view. ## Solution Make `Msaa` a component that is required on all camera bundles. ## Testing Ran a variety of examples to ensure that nothing broke. TODO: - [ ] Make sure android still works per previous comment in `extract_windows`. --- ## Migration Guide `Msaa` is no longer configured as a global resource, and should be specified on each spawned camera if a non-default setting is desired. --------- Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: François Mockers <francois.mockers@vleue.com>	2024-07-22 18:28:23 +00:00
JMS55	6e8d43a037	Faster MeshletMesh deserialization (#14193 ) # Objective - Using bincode to deserialize binary into a MeshletMesh is expensive (~77ms for a 5mb file). ## Solution - Write a custom deserializer using bytemuck's Pod types and slice casting. - Total asset load time has gone from ~102ms to ~12ms. - Change some types I never meant to be public to private and other misc cleanup. ## Testing - Ran the meshlet example and added timing spans to the asset loader. --- ## Changelog - Improved `MeshletMesh` loading speed - The `MeshletMesh` disk format has changed, and `MESHLET_MESH_ASSET_VERSION` has been bumped - `MeshletMesh` fields are now private - Renamed `MeshletMeshSaverLoad` to `MeshletMeshSaverLoader` - The `Meshlet`, `MeshletBoundingSpheres`, and `MeshletBoundingSphere` types are now private - Removed `MeshletMeshSaveOrLoadError::SerializationOrDeserialization` - Added `MeshletMeshSaveOrLoadError::WrongFileType` ## Migration Guide - Regenerate your `MeshletMesh` assets, as the disk format has changed, and `MESHLET_MESH_ASSET_VERSION` has been bumped - `MeshletMesh` fields are now private - `MeshletMeshSaverLoad` is now named `MeshletMeshSaverLoader` - The `Meshlet`, `MeshletBoundingSpheres`, and `MeshletBoundingSphere` types are now private - `MeshletMeshSaveOrLoadError::SerializationOrDeserialization` has been removed - Added `MeshletMeshSaveOrLoadError::WrongFileType`, match on this variant if you match on `MeshletMeshSaveOrLoadError`	2024-07-15 15:06:02 +00:00
JMS55	158ccc6d6a	Fix meshlet interactions with regular shading passes (#13816 ) * Fixes https://github.com/bevyengine/bevy/issues/13813 * Fixes https://github.com/bevyengine/bevy/issues/13810 Tested a combined scene with both regular meshes and meshlet meshes with: * Regular forward setup * Forward + normal/motion vector prepasses * Deferred (with depth prepass since that's required) * Deferred + depth/normal/motion vector prepasses Still broken: * Using meshlet meshes rendering in deferred and regular meshes rendering in forward + depth/normal prepass. I don't know how to fix this at the moment, so for now I've just add instructions to not mix them.	2024-06-21 19:06:08 +00:00
Jan Hohenheim	6273227e09	Fix lints introduced in Rust beta 1.80 (#13899 ) Resolves #13895 Mostly just involves being more explicit about which parts of the docs belong to a list and which begin a new paragraph. - found a few docs that were malformed because of exactly this, so I fixed that by introducing a paragraph - added indentation to nearly all multiline lists - fixed a few minor typos - added `#[allow(dead_code)]` to types that are needed to test annotations but are never constructed ([here](https://github.com/bevyengine/bevy/pull/13899/files#diff-b02b63604e569c8577c491e7a2030d456886d8f6716eeccd46b11df8aac75dafR1514) and [here](https://github.com/bevyengine/bevy/pull/13899/files#diff-b02b63604e569c8577c491e7a2030d456886d8f6716eeccd46b11df8aac75dafR1523)) - verified that `cargo +beta run -p ci -- lints` passes - verified that `cargo +beta run -p ci -- test` passes	2024-06-17 17:22:01 +00:00
JMS55	50ee483665	Meshlet misc (#13761 ) - Copy module docs so that they show up in the re-export - Change meshlet_id to cluster_id in the debug visualization - Small doc tweaks	2024-06-10 13:06:08 +00:00
JMS55	175e146228	Misc meshlet changes (#13705 ) * Rename cull_meshlets -> cull_clusters * Rename meshlet_visible -> cluster_visible * Add an if statement around meshlet_second_pass_candidates writes, maybe a small bit of performance.	2024-06-06 17:10:07 +00:00
JMS55	77ebabc4fe	Meshlet remove per-cluster data upload (#13125 ) # Objective - Per-cluster (instance of a meshlet) data upload is ridiculously expensive in both CPU and GPU time (8 bytes per cluster, millions of clusters, you very quickly run into PCIE bandwidth maximums, and lots of CPU-side copies and malloc). - We need to be uploading only per-instance/entity data. Anything else needs to be done on the GPU. ## Solution - Per instance, upload: - `meshlet_instance_meshlet_counts_prefix_sum` - An exclusive prefix sum over the count of how many clusters each instance has. - `meshlet_instance_meshlet_slice_starts` - The starting index of the meshlets for each instance within the `meshlets` buffer. - A new `fill_cluster_buffers` pass once at the start of the frame has a thread per cluster, and finds its instance ID and meshlet ID via a binary search of `meshlet_instance_meshlet_counts_prefix_sum` to find what instance it belongs to, and then uses that plus `meshlet_instance_meshlet_slice_starts` to find what number meshlet within the instance it is. The shader then writes out the per-cluster instance/meshlet ID buffers for later passes to quickly read from. - I've gone from 45 -> 180 FPS in my stress test scene, and saved ~30ms/frame of overall CPU/GPU time.	2024-05-04 19:56:19 +00:00
BD103	45bb6253e2	Restore dragons to their seat of power (#13124 ) # Objective - There is an unfortunate lack of dragons in the meshlet docs. - Dragons are symbolic of majesty, power, storms, and meshlets. - A dragon habitat such as our docs requires cultivation to ensure each winged lizard reaches their fullest, fiery selves. ## Solution - Fix the link to the dragon image. - The link originally targeted the `meshlet` branch, but that was later deleted after it was merged into `main`. --- ## Changelog - Added a dragon back into the `MeshletPlugin` documentation.	2024-04-28 07:20:16 +00:00
JMS55	e1a0da0fa6	Meshlet LOD-compatible two-pass occlusion culling (#12898 ) Keeping track of explicit visibility per cluster between frames does not work with LODs, and leads to worse culling (using the final depth buffer from the previous frame is more accurate). Instead, we need to generate a second depth pyramid after the second raster pass, and then use that in the first culling pass in the next frame to test if a cluster would have been visible last frame or not. As part of these changes, the write_index_buffer pass has been folded into the culling pass for a large performance gain, and to avoid tracking a lot of extra state that would be needed between passes. Prepass previous model/view stuff was adapted to work with meshlets as well. Also fixed a bug with materials, and other misc improvements. --------- Co-authored-by: François <mockersf@gmail.com> Co-authored-by: atlas dostal <rodol@rivalrebels.com> Co-authored-by: vero <email@atlasdostal.com> Co-authored-by: Patrick Walton <pcwalton@mimiga.net> Co-authored-by: Robert Swain <robert.swain@gmail.com>	2024-04-28 05:30:20 +00:00
JMS55	6d6810c90d	Meshlet continuous LOD (#12755 ) Adds a basic level of detail system to meshlets. An extremely brief summary is as follows: * In `from_mesh.rs`, once we've built the first level of clusters, we group clusters, simplify the new mega-clusters, and then split the simplified groups back into regular sized clusters. Repeat several times (ideally until you can't anymore). This forms a directed acyclic graph (DAG), where the children are the meshlets from the previous level, and the parents are the more simplified versions of their children. The leaf nodes are meshlets formed from the original mesh. * In `cull_meshlets.wgsl`, each cluster selects whether to render or not based on the LOD bounding sphere (different than the culling bounding sphere) of the current meshlet, the LOD bounding sphere of its parent (the meshlet group from simplification), and the simplification error relative to its children of both the current meshlet and its parent meshlet. This kind of breaks two pass occlusion culling, which will be fixed in a future PR by using an HZB from the previous frame to get the initial list of occluders. Many, _many_ improvements to be done in the future https://github.com/bevyengine/bevy/issues/11518, not least of which is code quality and speed. I don't even expect this to work on many types of input meshes. This is just a basic implementation/draft for collaboration. Arguable how much we want to do in this PR, I'll leave that up to maintainers. I've erred on the side of "as basic as possible". References: * Slides 27-77 (video available on youtube) https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf * https://blog.traverseresearch.nl/creating-a-directed-acyclic-graph-from-a-mesh-1329e57286e5 * https://jglrxavpok.github.io/2024/01/19/recreating-nanite-lod-generation.html, https://jglrxavpok.github.io/2024/03/12/recreating-nanite-faster-lod-generation.html, https://jglrxavpok.github.io/2024/04/02/recreating-nanite-runtime-lod-selection.html, and https://github.com/jglrxavpok/Carrot * https://github.com/gents83/INOX/tree/master/crates/plugins/binarizer/src * https://cs418.cs.illinois.edu/website/text/nanite.html ![image](https://github.com/bevyengine/bevy/assets/47158642/e40bff9b-7d0c-4a19-a3cc-2aad24965977) ![image](https://github.com/bevyengine/bevy/assets/47158642/442c7da3-7761-4da7-9acd-37f15dd13e26) --------- Co-authored-by: Ricky Taylor <rickytaylor26@gmail.com> Co-authored-by: vero <email@atlasdostal.com> Co-authored-by: François <mockersf@gmail.com> Co-authored-by: atlas dostal <rodol@rivalrebels.com> Co-authored-by: Patrick Walton <pcwalton@mimiga.net>	2024-04-23 21:43:53 +00:00
Brezak	f68bc01544	Run `CheckVisibility` after all the other visibility system sets have… (#12962 ) # Objective Make visibility system ordering explicit. Fixes #12953. ## Solution Specify `CheckVisibility` happens after all other `VisibilitySystems` sets have happened. --------- Co-authored-by: Elabajaba <Elabajaba@users.noreply.github.com>	2024-04-18 20:33:29 +00:00
Patrick Walton	8577a448f7	Fix rendering of sprites, text, and meshlets after #12582 . (#12945 ) `Sprite`, `Text`, and `Handle<MeshletMesh>` were types of renderable entities that the new segregated visible entity system didn't handle, so they didn't appear. Because `bevy_text` depends on `bevy_sprite`, and the visibility computation of text happens in the latter crate, I had to introduce a new marker component, `SpriteSource`. `SpriteSource` marks entities that aren't themselves sprites but become sprites during rendering. I added this component to `Text2dBundle`. Unfortunately, this is technically a breaking change, although I suspect it won't break anybody in practice except perhaps editors. Fixes #12935. ## Changelog ### Changed * `Text2dBundle` now includes a new marker component, `SpriteSource`. Bevy uses this internally to optimize visibility calculation. ## Migration Guide * `Text` now requires a `SpriteSource` marker component in order to appear. This component has been added to `Text2dBundle`.	2024-04-13 14:15:00 +00:00
Cameron	01649f13e2	Refactor `App` and `SubApp` internals for better separation (#9202 ) # Objective This is a necessary precursor to #9122 (this was split from that PR to reduce the amount of code to review all at once). Moving `!Send` resource ownership to `App` will make it unambiguously `!Send`. `SubApp` must be `Send`, so it can't wrap `App`. ## Solution Refactor `App` and `SubApp` to not have a recursive relationship. Since `SubApp` no longer wraps `App`, once `!Send` resources are moved out of `World` and into `App`, `SubApp` will become unambiguously `Send`. There could be less code duplication between `App` and `SubApp`, but that would break `App` method chaining. ## Changelog - `SubApp` no longer wraps `App`. - `App` fields are no longer publicly accessible. - `App` can no longer be converted into a `SubApp`. - Various methods now return references to a `SubApp` instead of an `App`. ## Migration Guide - To construct a sub-app, use `SubApp::new()`. `App` can no longer convert into `SubApp`. - If you implemented a trait for `App`, you may want to implement it for `SubApp` as well. - If you're accessing `app.world` directly, you now have to use `app.world()` and `app.world_mut()`. - `App::sub_app` now returns `&SubApp`. - `App::sub_app_mut` now returns `&mut SubApp`. - `App::get_sub_app` now returns `Option<&SubApp>.` - `App::get_sub_app_mut` now returns `Option<&mut SubApp>.`	2024-03-31 03:16:10 +00:00
JMS55	4f20faaa43	Meshlet rendering (initial feature) (#10164 ) # Objective - Implements a more efficient, GPU-driven (https://github.com/bevyengine/bevy/issues/1342) rendering pipeline based on meshlets. - Meshes are split into small clusters of triangles called meshlets, each of which acts as a mini index buffer into the larger mesh data. Meshlets can be compressed, streamed, culled, and batched much more efficiently than monolithic meshes. ![image](https://github.com/bevyengine/bevy/assets/47158642/cb2aaad0-7a9a-4e14-93b0-15d4e895b26a) ![image](https://github.com/bevyengine/bevy/assets/47158642/7534035b-1eb7-4278-9b99-5322e4401715) # Misc * Future work: https://github.com/bevyengine/bevy/issues/11518 * Nanite reference: https://advances.realtimerendering.com/s2021/Karis_Nanite_SIGGRAPH_Advances_2021_final.pdf Two pass occlusion culling explained very well: https://medium.com/@mil_kru/two-pass-occlusion-culling-4100edcad501 --------- Co-authored-by: Ricky Taylor <rickytaylor26@gmail.com> Co-authored-by: vero <email@atlasdostal.com> Co-authored-by: François <mockersf@gmail.com> Co-authored-by: atlas dostal <rodol@rivalrebels.com>	2024-03-25 19:08:27 +00:00

37 Commits