Commit Graph

6 Commits

Author SHA1 Message Date
Aevyrie
bed9ddf3ce
Refactor and simplify custom projections (#17063)
# Objective

- Fixes https://github.com/bevyengine/bevy/issues/16556
- Closes https://github.com/bevyengine/bevy/issues/11807

## Solution

- Simplify custom projections by using a single source of truth -
`Projection`, removing all existing generic systems and types.
- Existing perspective and orthographic structs are no longer components
- I could dissolve these to simplify further, but keeping them around
was the fast way to implement this.
- Instead of generics, introduce a third variant, with a trait object.
- Do an object safety dance with an intermediate trait to allow cloning
boxed camera projections. This is a normal rust polymorphism papercut.
You can do this with a crate but a manual impl is short and sweet.

## Testing

- Added a custom projection example

---

## Showcase

- Custom projections and projection handling has been simplified.
- Projection systems are no longer generic, with the potential for many
different projection components on the same camera.
- Instead `Projection` is now the single source of truth for camera
projections, and is the only projection component.
- Custom projections are still supported, and can be constructed with
`Projection::custom()`.

## Migration Guide

- `PerspectiveProjection` and `OrthographicProjection` are no longer
components. Use `Projection` instead.
- Custom projections should no longer be inserted as a component.
Instead, simply set the custom projection as a value of `Projection`
with `Projection::custom()`.
2025-01-01 20:44:24 +00:00
Patrick Walton
5adf831b42
Add a bindless mode to AsBindGroup. (#16368)
This patch adds the infrastructure necessary for Bevy to support
*bindless resources*, by adding a new `#[bindless]` attribute to
`AsBindGroup`.

Classically, only a single texture (or sampler, or buffer) can be
attached to each shader binding. This means that switching materials
requires breaking a batch and issuing a new drawcall, even if the mesh
is otherwise identical. This adds significant overhead not only in the
driver but also in `wgpu`, as switching bind groups increases the amount
of validation work that `wgpu` must do.

*Bindless resources* are the typical solution to this problem. Instead
of switching bindings between each texture, the renderer instead
supplies a large *array* of all textures in the scene up front, and the
material contains an index into that array. This pattern is repeated for
buffers and samplers as well. The renderer now no longer needs to switch
binding descriptor sets while drawing the scene.

Unfortunately, as things currently stand, this approach won't quite work
for Bevy. Two aspects of `wgpu` conspire to make this ideal approach
unacceptably slow:

1. In the DX12 backend, all binding arrays (bindless resources) must
have a constant size declared in the shader, and all textures in an
array must be bound to actual textures. Changing the size requires a
recompile.

2. Changing even one texture incurs revalidation of all textures, a
process that takes time that's linear in the total size of the binding
array.

This means that declaring a large array of textures big enough to
encompass the entire scene is presently unacceptably slow. For example,
if you declare 4096 textures, then `wgpu` will have to revalidate all
4096 textures if even a single one changes. This process can take
multiple frames.

To work around this problem, this PR groups bindless resources into
small *slabs* and maintains a free list for each. The size of each slab
for the bindless arrays associated with a material is specified via the
`#[bindless(N)]` attribute. For instance, consider the following
declaration:

```rust
#[derive(AsBindGroup)]
#[bindless(16)]
struct MyMaterial {
    #[buffer(0)]
    color: Vec4,
    #[texture(1)]
    #[sampler(2)]
    diffuse: Handle<Image>,
}
```

The `#[bindless(N)]` attribute specifies that, if bindless arrays are
supported on the current platform, each resource becomes a binding array
of N instances of that resource. So, for `MyMaterial` above, the `color`
attribute is exposed to the shader as `binding_array<vec4<f32>, 16>`,
the `diffuse` texture is exposed to the shader as
`binding_array<texture_2d<f32>, 16>`, and the `diffuse` sampler is
exposed to the shader as `binding_array<sampler, 16>`. Inside the
material's vertex and fragment shaders, the applicable index is
available via the `material_bind_group_slot` field of the `Mesh`
structure. So, for instance, you can access the current color like so:

```wgsl
// `uniform` binding arrays are a non-sequitur, so `uniform` is automatically promoted
// to `storage` in bindless mode.
@group(2) @binding(0) var<storage> material_color: binding_array<Color, 4>;
...
@fragment
fn fragment(in: VertexOutput) -> @location(0) vec4<f32> {
    let color = material_color[mesh[in.instance_index].material_bind_group_slot];
    ...
}
```

Note that portable shader code can't guarantee that the current platform
supports bindless textures. Indeed, bindless mode is only available in
Vulkan and DX12. The `BINDLESS` shader definition is available for your
use to determine whether you're on a bindless platform or not. Thus a
portable version of the shader above would look like:

```wgsl
#ifdef BINDLESS
@group(2) @binding(0) var<storage> material_color: binding_array<Color, 4>;
#else // BINDLESS
@group(2) @binding(0) var<uniform> material_color: Color;
#endif // BINDLESS
...
@fragment
fn fragment(in: VertexOutput) -> @location(0) vec4<f32> {
#ifdef BINDLESS
    let color = material_color[mesh[in.instance_index].material_bind_group_slot];
#else // BINDLESS
    let color = material_color;
#endif // BINDLESS
    ...
}
```

Importantly, this PR *doesn't* update `StandardMaterial` to be bindless.
So, for example, `scene_viewer` will currently not run any faster. I
intend to update `StandardMaterial` to use bindless mode in a follow-up
patch.

A new example, `shaders/shader_material_bindless`, has been added to
demonstrate how to use this new feature.

Here's a Tracy profile of `submit_graph_commands` of this patch and an
additional patch (not submitted yet) that makes `StandardMaterial` use
bindless. Red is those patches; yellow is `main`. The scene was Bistro
Exterior with a hack that forces all textures to opaque. You can see a
1.47x mean speedup.
![Screenshot 2024-11-12
161713](https://github.com/user-attachments/assets/4334b362-42c8-4d64-9cfb-6835f019b95c)

## Migration Guide

* `RenderAssets::prepare_asset` now takes an `AssetId` parameter.
* Bin keys now have Bevy-specific material bind group indices instead of
`wgpu` material bind group IDs, as part of the bindless change. Use the
new `MaterialBindGroupAllocator` to map from bind group index to bind
group ID.
2024-12-03 18:00:34 +00:00
atlv
c29e67153b
Expose Pipeline Compilation Zero Initialize Workgroup Memory Option (#16301)
# Objective

- wgpu 0.20 made workgroup vars stop being zero-init by default. this
broke some applications (cough foresight cough) and now we workaround
it. wgpu exposes a compilation option that zero initializes workgroup
memory by default, but bevy does not expose it.

## Solution

- expose the compilation option wgpu gives us

## Testing

- ran examples: 3d_scene, compute_shader_game_of_life, gpu_readback,
lines, specialized_mesh_pipeline. they all work
- confirmed fix for our own problems

---

</details>

## Migration Guide

- add `zero_initialize_workgroup_memory: false,` to
`ComputePipelineDescriptor` or `RenderPipelineDescriptor` structs to
preserve 0.14 functionality, add `zero_initialize_workgroup_memory:
true,` to restore bevy 0.13 functionality.
2024-11-08 21:42:37 +00:00
Zachary Harrold
d70595b667
Add core and alloc over std Lints (#15281)
# Objective

- Fixes #6370
- Closes #6581

## Solution

- Added the following lints to the workspace:
  - `std_instead_of_core`
  - `std_instead_of_alloc`
  - `alloc_instead_of_core`
- Used `cargo +nightly fmt` with [item level use
formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Item%5C%3A)
to split all `use` statements into single items.
- Used `cargo clippy --workspace --all-targets --all-features --fix
--allow-dirty` to _attempt_ to resolve the new linting issues, and
intervened where the lint was unable to resolve the issue automatically
(usually due to needing an `extern crate alloc;` statement in a crate
root).
- Manually removed certain uses of `std` where negative feature gating
prevented `--all-features` from finding the offending uses.
- Used `cargo +nightly fmt` with [crate level use
formatting](https://rust-lang.github.io/rustfmt/?version=v1.6.0&search=#Crate%5C%3A)
to re-merge all `use` statements matching Bevy's previous styling.
- Manually fixed cases where the `fmt` tool could not re-merge `use`
statements due to conditional compilation attributes.

## Testing

- Ran CI locally

## Migration Guide

The MSRV is now 1.81. Please update to this version or higher.

## Notes

- This is a _massive_ change to try and push through, which is why I've
outlined the semi-automatic steps I used to create this PR, in case this
fails and someone else tries again in the future.
- Making this change has no impact on user code, but does mean Bevy
contributors will be warned to use `core` and `alloc` instead of `std`
where possible.
- This lint is a critical first step towards investigating `no_std`
options for Bevy.

---------

Co-authored-by: François Mockers <francois.mockers@vleue.com>
2024-09-27 00:59:59 +00:00
Joona Aalto
afbbbd7335
Rename rendering components for improved consistency and clarity (#15035)
# Objective

The names of numerous rendering components in Bevy are inconsistent and
a bit confusing. Relevant names include:

- `AutoExposureSettings`
- `AutoExposureSettingsUniform`
- `BloomSettings`
- `BloomUniform` (no `Settings`)
- `BloomPrefilterSettings`
- `ChromaticAberration` (no `Settings`)
- `ContrastAdaptiveSharpeningSettings`
- `DepthOfFieldSettings`
- `DepthOfFieldUniform` (no `Settings`)
- `FogSettings`
- `SmaaSettings`, `Fxaa`, `TemporalAntiAliasSettings` (really
inconsistent??)
- `ScreenSpaceAmbientOcclusionSettings`
- `ScreenSpaceReflectionsSettings`
- `VolumetricFogSettings`

Firstly, there's a lot of inconsistency between `Foo`/`FooSettings` and
`FooUniform`/`FooSettingsUniform` and whether names are abbreviated or
not.

Secondly, the `Settings` post-fix seems unnecessary and a bit confusing
semantically, since it makes it seem like the component is mostly just
auxiliary configuration instead of the core *thing* that actually
enables the feature. This will be an even bigger problem once bundles
like `TemporalAntiAliasBundle` are deprecated in favor of required
components, as users will expect a component named `TemporalAntiAlias`
(or similar), not `TemporalAntiAliasSettings`.

## Solution

Drop the `Settings` post-fix from the component names, and change some
names to be more consistent.

- `AutoExposure`
- `AutoExposureUniform`
- `Bloom`
- `BloomUniform`
- `BloomPrefilter`
- `ChromaticAberration`
- `ContrastAdaptiveSharpening`
- `DepthOfField`
- `DepthOfFieldUniform`
- `DistanceFog`
- `Smaa`, `Fxaa`, `TemporalAntiAliasing` (note: we might want to change
to `Taa`, see "Discussion")
- `ScreenSpaceAmbientOcclusion`
- `ScreenSpaceReflections`
- `VolumetricFog`

I kept the old names as deprecated type aliases to make migration a bit
less painful for users. We should remove them after the next release.
(And let me know if I should just... not add them at all)

I also added some very basic docs for a few types where they were
missing, like on `Fxaa` and `DepthOfField`.

## Discussion

- `TemporalAntiAliasing` is still inconsistent with `Smaa` and `Fxaa`.
Consensus [on
Discord](https://discord.com/channels/691052431525675048/743663924229963868/1280601167209955431)
seemed to be that renaming to `Taa` would probably be fine, but I think
it's a bit more controversial, and it would've required renaming a lot
of related types like `TemporalAntiAliasNode`,
`TemporalAntiAliasBundle`, and `TemporalAntiAliasPlugin`, so I think
it's better to leave to a follow-up.
- I think `Fog` should probably have a more specific name like
`DistanceFog` considering it seems to be distinct from `VolumetricFog`.
~~This should probably be done in a follow-up though, so I just removed
the `Settings` post-fix for now.~~ (done)

---

## Migration Guide

Many rendering components have been renamed for improved consistency and
clarity.

- `AutoExposureSettings` → `AutoExposure`
- `BloomSettings` → `Bloom`
- `BloomPrefilterSettings` → `BloomPrefilter`
- `ContrastAdaptiveSharpeningSettings` → `ContrastAdaptiveSharpening`
- `DepthOfFieldSettings` → `DepthOfField`
- `FogSettings` → `DistanceFog`
- `SmaaSettings` → `Smaa`
- `TemporalAntiAliasSettings` → `TemporalAntiAliasing`
- `ScreenSpaceAmbientOcclusionSettings` → `ScreenSpaceAmbientOcclusion`
- `ScreenSpaceReflectionsSettings` → `ScreenSpaceReflections`
- `VolumetricFogSettings` → `VolumetricFog`

---------

Co-authored-by: Carter Anderson <mcanders1@gmail.com>
2024-09-10 01:11:46 +00:00
JMS55
6cc96f4c1f
Meshlet software raster + start of cleanup (#14623)
# Objective
- Faster meshlet rasterization path for small triangles
- Avoid having to allocate and write out a triangle buffer
- Refactor gpu_scene.rs

## Solution
- Replace the 32bit visbuffer texture with a 64bit visbuffer buffer,
where the left 32 bits encode depth, and the right 32 bits encode the
existing cluster + triangle IDs. Can't use 64bit textures, wgpu/naga
doesn't support atomic ops on textures yet.
- Instead of writing out a buffer of packed cluster + triangle IDs (per
triangle) to raster, the culling pass now writes out a buffer of just
cluster IDs (per cluster, so less memory allocated, cheaper to write
out).
  - Clusters for software raster are allocated from the left side
- Clusters for hardware raster are allocated in the same buffer, from
the right side
- The buffer size is fixed at MeshletPlugin build time, and should be
set to a reasonable value for your scene (no warning on overflow, and no
good way to determine what value you need outside of renderdoc - I plan
to fix this in a future PR adding a meshlet stats overlay)
- Currently I don't have a heuristic for software vs hardware raster
selection for each cluster. The existing code is just a placeholder. I
need to profile on a release scene and come up with a heuristic,
probably in a future PR.
- The culling shader is getting pretty hard to follow at this point, but
I don't want to spend time improving it as the entire shader/pass is
getting rewritten/replaced in the near future.
- Software raster is a compute workgroup per-cluster. Each workgroup
loads and transforms the <=64 vertices of the cluster, and then
rasterizes the <=64 triangles of the cluster.
- Two variants are implemented: Scanline for clusters with any larger
triangles (still smaller than hardware is good at), and brute-force for
very very tiny triangles
- Once the shader determines that a pixel should be filled in, it does
an atomicMax() on the visbuffer to store the results, copying how Nanite
works
- On devices with a low max workgroups per dispatch limit, an extra
compute pass is inserted before software raster to convert from a 1d to
2d dispatch (I don't think 3d would ever be necessary).
- I haven't implemented the top-left rule or subpixel precision yet, I'm
leaving that for a future PR since I get usable results without it for
now
- Resources used:
https://kristoffer-dyrkorn.github.io/triangle-rasterizer and chapters
6-8 of
https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index
- Hardware raster now spawns 64*3 vertex invocations per meshlet,
instead of the actual meshlet vertex count. Extra invocations just
early-exit.
- While this is slower than the existing system, hardware draws should
be rare now that software raster is usable, and it saves a ton of memory
using the unified cluster ID buffer. This would be fixed if wgpu had
support for mesh shaders.
- Instead of writing to a color+depth attachment, the hardware raster
pass also does the same atomic visbuffer writes that software raster
uses.
- We have to bind a dummy render target anyways, as wgpu doesn't
currently support render passes without any attachments
- Material IDs are no longer written out during the main rasterization
passes.
- If we had async compute queues, we could overlap the software and
hardware raster passes.
- New material and depth resolve passes run at the end of the visbuffer
node, and write out view depth and material ID depth textures

### Misc changes
- Fixed cluster culling importing, but never actually using the previous
view uniforms when doing occlusion culling
- Fixed incorrectly adding the LOD error twice when building the meshlet
mesh
- Splitup gpu_scene module into meshlet_mesh_manager, instance_manager,
and resource_manager
- resource_manager is still too complex and inefficient (extract and
prepare are way too expensive). I plan on improving this in a future PR,
but for now ResourceManager is mostly a 1:1 port of the leftover
MeshletGpuScene bits.
- Material draw passes have been renamed to the more accurate material
shade pass, as well as some other misc renaming (in the future, these
will be compute shaders even, and not actual draw calls)

---

## Migration Guide
- TBD (ask me at the end of the release for meshlet changes as a whole)

---------

Co-authored-by: vero <email@atlasdostal.com>
2024-08-26 17:54:34 +00:00