# Objective
- Fixes#10909
- Fixes#8492
## Solution
- Name all matrices `x_from_y`, for example `world_from_view`.
## Testing
- I've tested most of the 3D examples. The `lighting` example
particularly should hit a lot of the changes and appears to run fine.
---
## Changelog
- Renamed matrices across the engine to follow a `y_from_x` naming,
making the space conversion more obvious.
## Migration Guide
- `Frustum`'s `from_view_projection`, `from_view_projection_custom_far`
and `from_view_projection_no_far` were renamed to
`from_clip_from_world`, `from_clip_from_world_custom_far` and
`from_clip_from_world_no_far`.
- `ComputedCameraValues::projection_matrix` was renamed to
`clip_from_view`.
- `CameraProjection::get_projection_matrix` was renamed to
`get_clip_from_view` (this affects implementations on `Projection`,
`PerspectiveProjection` and `OrthographicProjection`).
- `ViewRangefinder3d::from_view_matrix` was renamed to
`from_world_from_view`.
- `PreviousViewData`'s members were renamed to `view_from_world` and
`clip_from_world`.
- `ExtractedView`'s `projection`, `transform` and `view_projection` were
renamed to `clip_from_view`, `world_from_view` and `clip_from_world`.
- `ViewUniform`'s `view_proj`, `unjittered_view_proj`,
`inverse_view_proj`, `view`, `inverse_view`, `projection` and
`inverse_projection` were renamed to `clip_from_world`,
`unjittered_clip_from_world`, `world_from_clip`, `world_from_view`,
`view_from_world`, `clip_from_view` and `view_from_clip`.
- `GpuDirectionalCascade::view_projection` was renamed to
`clip_from_world`.
- `MeshTransforms`' `transform` and `previous_transform` were renamed to
`world_from_local` and `previous_world_from_local`.
- `MeshUniform`'s `transform`, `previous_transform`,
`inverse_transpose_model_a` and `inverse_transpose_model_b` were renamed
to `world_from_local`, `previous_world_from_local`,
`local_from_world_transpose_a` and `local_from_world_transpose_b` (the
`Mesh` type in WGSL mirrors this, however `transform` and
`previous_transform` were named `model` and `previous_model`).
- `Mesh2dTransforms::transform` was renamed to `world_from_local`.
- `Mesh2dUniform`'s `transform`, `inverse_transpose_model_a` and
`inverse_transpose_model_b` were renamed to `world_from_local`,
`local_from_world_transpose_a` and `local_from_world_transpose_b` (the
`Mesh2d` type in WGSL mirrors this).
- In WGSL, in `bevy_pbr::mesh_functions`, `get_model_matrix` and
`get_previous_model_matrix` were renamed to `get_world_from_local` and
`get_previous_world_from_local`.
- In WGSL, `bevy_sprite::mesh2d_functions::get_model_matrix` was renamed
to `get_world_from_local`.
# Objective
- Using multiple raster passes to generate the depth pyramid is
extremely slow
- Pulling data from the source image is the largest bottleneck, it's
important to sample in a cache-aware pattern
- Barriers and pipeline drain between the raster passes is the second
largest bottleneck
- Each separate RenderPass on the CPU is _really_ expensive
## Solution
- Port [FidelityFX SPD](https://gpuopen.com/fidelityfx-spd) to WGSL,
replacing meshlet's existing multiple raster passes with a ~~single~~
two compute dispatches. Lack of coherent buffers means we have to do the
the last 64x64 tile from mip 7+ in a separate dispatch to ensure the mip
6 writes were flushed :(
- Workgroup shared memory version only at the moment, as the subgroup
operation is blocked by our upgrade to wgpu 0.20 #13186
- Don't enforce a power-of-2 depth pyramid texture size, simply scaling
by 0.5 is fine
# Objective
- Add motion vector support to the skybox
- This fixes the last remaining "gap" to complete the motion blur
feature
## Solution
- Add a pipeline for the skybox to write motion vectors to the prepass
## Testing
- Used examples to test motion vectors using motion blur
https://github.com/bevyengine/bevy/assets/2632925/74c0778a-7e77-4e68-8111-05791e4bfdd2
---------
Co-authored-by: Patrick Walton <pcwalton@mimiga.net>
# Objective
- Per-cluster (instance of a meshlet) data upload is ridiculously
expensive in both CPU and GPU time (8 bytes per cluster, millions of
clusters, you very quickly run into PCIE bandwidth maximums, and lots of
CPU-side copies and malloc).
- We need to be uploading only per-instance/entity data. Anything else
needs to be done on the GPU.
## Solution
- Per instance, upload:
- `meshlet_instance_meshlet_counts_prefix_sum` - An exclusive prefix sum
over the count of how many clusters each instance has.
- `meshlet_instance_meshlet_slice_starts` - The starting index of the
meshlets for each instance within the `meshlets` buffer.
- A new `fill_cluster_buffers` pass once at the start of the frame has a
thread per cluster, and finds its instance ID and meshlet ID via a
binary search of `meshlet_instance_meshlet_counts_prefix_sum` to find
what instance it belongs to, and then uses that plus
`meshlet_instance_meshlet_slice_starts` to find what number meshlet
within the instance it is. The shader then writes out the per-cluster
instance/meshlet ID buffers for later passes to quickly read from.
- I've gone from 45 -> 180 FPS in my stress test scene, and saved
~30ms/frame of overall CPU/GPU time.
Keeping track of explicit visibility per cluster between frames does not
work with LODs, and leads to worse culling (using the final depth buffer
from the previous frame is more accurate).
Instead, we need to generate a second depth pyramid after the second
raster pass, and then use that in the first culling pass in the next
frame to test if a cluster would have been visible last frame or not.
As part of these changes, the write_index_buffer pass has been folded
into the culling pass for a large performance gain, and to avoid
tracking a lot of extra state that would be needed between passes.
Prepass previous model/view stuff was adapted to work with meshlets as
well.
Also fixed a bug with materials, and other misc improvements.
---------
Co-authored-by: François <mockersf@gmail.com>
Co-authored-by: atlas dostal <rodol@rivalrebels.com>
Co-authored-by: vero <email@atlasdostal.com>
Co-authored-by: Patrick Walton <pcwalton@mimiga.net>
Co-authored-by: Robert Swain <robert.swain@gmail.com>
# Objective
- Fixes#12976
## Solution
This one is a doozy.
- Run `cargo +beta clippy --workspace --all-targets --all-features` and
fix all issues
- This includes:
- Moving inner attributes to be outer attributes, when the item in
question has both inner and outer attributes
- Use `ptr::from_ref` in more scenarios
- Extend the valid idents list used by `clippy:doc_markdown` with more
names
- Use `Clone::clone_from` when possible
- Remove redundant `ron` import
- Add backticks to **so many** identifiers and items
- I'm sorry whoever has to review this
---
## Changelog
- Added links to more identifiers in documentation.
# Objective
Since BufferVec was first introduced, `bytemuck` has added additional
traits with fewer restrictions than `Pod`. Within BufferVec, we only
rely on the constraints of `bytemuck::cast_slice` to a `u8` slice, which
now only requires `T: NoUninit` which is a strict superset of `Pod`
types.
## Solution
Change out the `Pod` generic type constraint with `NoUninit`. Also
taking the opportunity to substitute `cast_slice` with
`must_cast_slice`, which avoids a runtime panic in place of a compile
time failure if `T` cannot be used.
---
## Changelog
Changed: `BufferVec` now supports working with types containing
`NoUninit` but not `Pod` members.
Changed: `BufferVec` will now fail to compile if used with a type that
cannot be safely read from. Most notably, this includes ZSTs, which
would previously always panic at runtime.