bevy/crates at 9c0fca072d817b9da5510c0096d5aa97b024e36e - bevy

History

Gonçalo Rica Pais da Silva 9c0fca072d Optimise `Entity` with repr align & manual `PartialOrd`/`Ord` (#10558 ) # Objective - Follow up on https://github.com/bevyengine/bevy/pull/10519, diving deeper into optimising `Entity` due to the `derive`d `PartialOrd` `partial_cmp` not being optimal with codegen: https://github.com/rust-lang/rust/issues/106107 - Fixes #2346. ## Solution Given the previous PR's solution and the other existing LLVM codegen bug, there seemed to be a potential further optimisation possible with `Entity`. In exploring providing manual `PartialOrd` impl, it turned out initially that the resulting codegen was not immediately better than the derived version. However, once `Entity` was given `#[repr(align(8)]`, the codegen improved remarkably, even more once the fields in `Entity` were rearranged to correspond to a `u64` layout (Rust doesn't automatically reorder fields correctly it seems). The field order and `align(8)` additions also improved `to_bits` codegen to be a single `mov` op. In turn, this led me to replace the previous "non-shortcircuiting" impl of `PartialEq::eq` to use direct `to_bits` comparison. The result was remarkably better codegen across the board, even for hastable lookups. The current baseline codegen is as follows: https://godbolt.org/z/zTW1h8PnY Assuming the following example struct that mirrors with the existing `Entity` definition: ```rust #[derive(Clone, Copy, Eq, PartialEq, PartialOrd, Ord)] pub struct FakeU64 { high: u32, low: u32, } ``` the output for `to_bits` is as follows: ``` example::FakeU64::to_bits: shl rdi, 32 mov eax, esi or rax, rdi ret ``` Changing the struct to: ```rust #[derive(Clone, Copy, Eq)] #[repr(align(8))] pub struct FakeU64 { low: u32, high: u32, } ``` and providing manual implementations for `PartialEq`/`PartialOrd`/`Ord`, `to_bits` now optimises to: ``` example::FakeU64::to_bits: mov rax, rdi ret ``` The full codegen example for this PR is here for reference: https://godbolt.org/z/n4Mjx165a To highlight, `gt` comparison goes from ``` example::greater_than: cmp edi, edx jae .LBB3_2 xor eax, eax ret .LBB3_2: setne dl cmp esi, ecx seta al or al, dl ret ``` to ``` example::greater_than: cmp rdi, rsi seta al ret ``` As explained on Discord by @scottmcm : >The root issue here, as far as I understand it, is that LLVM's middle-end is inexplicably unwilling to merge loads if that would make them under-aligned. It leaves that entirely up to its target-specific back-end, and thus a bunch of the things that you'd expect it to do that would fix this just don't happen. ## Benchmarks Before discussing benchmarks, everything was tested on the following specs: AMD Ryzen 7950X 16C/32T CPU 64GB 5200 RAM AMD RX7900XT 20GB Gfx card Manjaro KDE on Wayland I made use of the new entity hashing benchmarks to see how this PR would improve things there. With the changes in place, I first did an implementation keeping the existing "non shortcircuit" `PartialEq` implementation in place, but with the alignment and field ordering changes, which in the benchmark is the `ord_shortcircuit` column. The `to_bits` `PartialEq` implementation is the `ord_to_bits` column. The main_ord column is the current existing baseline from `main` branch. ![Screenshot_20231114_132908](https://github.com/bevyengine/bevy/assets/3116268/cb9090c9-ff74-4cc5-abae-8e4561332261) My machine is not super set-up for benchmarking, so some results are within noise, but there's not just a clear improvement between the non-shortcircuiting implementation, but even further optimisation taking place with the `to_bits` implementation. On my machine, a fair number of the stress tests were not showing any difference (indicating other bottlenecks), but I was able to get a clear difference with `many_foxes` with a fox count of 10,000: Test with `cargo run --example many_foxes --features bevy/trace_tracy,wayland --release -- --count 10000`: ![Screenshot_20231114_144217](https://github.com/bevyengine/bevy/assets/3116268/89bdc21c-7209-43c8-85ae-efbf908bfed3) On avg, a framerate of about 28-29FPS was improved to 30-32FPS. "This trace" represents the current PR's perf, while "External trace" represents the `main` branch baseline. ## Changelog Changed: micro-optimized Entity align and field ordering as well as providing manual `PartialOrd`/`Ord` impls to help LLVM optimise further. ## Migration Guide Any `unsafe` code relying on field ordering of `Entity` or sufficiently cursed shenanigans should change to reflect the different internal representation and alignment requirements of `Entity`. Co-authored-by: james7132 <contact@jamessliu.com> Co-authored-by: NathanW <nathansward@comcast.net>		2023-11-18 20:04:37 +00:00
..
bevy_a11y	Release 0.12 (#10362 )	2023-11-04 17:24:23 +00:00
bevy_animation	Add Debug, PartialEq and Eq derives to bevy_animation. (#10562 )	2023-11-15 14:05:04 +00:00
bevy_app	add regression test for #10385/#10389 (#10609 )	2023-11-18 12:53:09 +00:00
bevy_asset	Do not panic when failing to create assets folder (#10613 ) (#10614 )	2023-11-17 22:06:08 +00:00
bevy_audio	Add `VolumeLevel::ZERO` (#10608 )	2023-11-17 15:15:56 +00:00
bevy_core	Release 0.12 (#10362 )	2023-11-04 17:24:23 +00:00
bevy_core_pipeline	Release 0.12 (#10362 )	2023-11-04 17:24:23 +00:00
bevy_derive	Release 0.12 (#10362 )	2023-11-04 17:24:23 +00:00
bevy_diagnostic	Release 0.12 (#10362 )	2023-11-04 17:24:23 +00:00
bevy_dylib	Release 0.12 (#10362 )	2023-11-04 17:24:23 +00:00
bevy_dynamic_plugin	Release 0.12 (#10362 )	2023-11-04 17:24:23 +00:00
bevy_ecs	Optimise `Entity` with repr align & manual `PartialOrd`/`Ord` (#10558 )	2023-11-18 20:04:37 +00:00
bevy_ecs_compile_fail_tests	Updates for rust 1.73 (#10035 )	2023-10-06 00:31:10 +00:00
bevy_encase_derive	Release 0.12 (#10362 )	2023-11-04 17:24:23 +00:00
bevy_gilrs	Update `Event` send methods to return `EventId` (#10551 )	2023-11-16 17:20:43 +00:00
bevy_gizmos	Gizmo Arrows (#10550 )	2023-11-15 14:19:15 +00:00
bevy_gltf	Explicit color conversion methods (#10321 )	2023-11-15 16:47:32 +00:00
bevy_hierarchy	Release 0.12 (#10362 )	2023-11-04 17:24:23 +00:00
bevy_input	Update `Event` send methods to return `EventId` (#10551 )	2023-11-16 17:20:43 +00:00
bevy_internal	Release 0.12 (#10362 )	2023-11-04 17:24:23 +00:00
bevy_log	Release 0.12 (#10362 )	2023-11-04 17:24:23 +00:00
bevy_macro_utils	Release 0.12 (#10362 )	2023-11-04 17:24:23 +00:00
bevy_macros_compile_fail_tests	bevy_derive: Fix `#[deref]` breaking other attributes (#9551 )	2023-08-28 17:36:18 +00:00
bevy_math	Add and impl Primitives (#10580 )	2023-11-17 20:10:30 +00:00
bevy_mikktspace	Release 0.12 (#10362 )	2023-11-04 17:24:23 +00:00
bevy_pbr	Ensure ExtendedMaterial works with reflection (to enable bevy_egui_inspector integration) (#10548 )	2023-11-15 12:48:36 +00:00
bevy_ptr	Release 0.12 (#10362 )	2023-11-04 17:24:23 +00:00
bevy_reflect	Release 0.12 (#10362 )	2023-11-04 17:24:23 +00:00
bevy_reflect_compile_fail_tests	Improve TypeUuid's derive macro error messages (#9315 )	2023-10-02 12:42:01 +00:00
bevy_render	Explicit color conversion methods (#10321 )	2023-11-15 16:47:32 +00:00
bevy_scene	Use handles for queued scenes in SceneSpawner (#10619 )	2023-11-18 01:27:05 +00:00
bevy_sprite	Add PartialEq to Anchor (#10424 )	2023-11-07 08:36:10 +00:00
bevy_tasks	Make FakeTask public on singlethreaded context (#10517 )	2023-11-15 14:29:43 +00:00
bevy_text	Improved Text Rendering (#10537 )	2023-11-14 13:44:25 +00:00
bevy_time	Add discard_overstep function to Time<Fixed> (#10453 )	2023-11-18 00:50:26 +00:00
bevy_transform	Release 0.12 (#10362 )	2023-11-04 17:24:23 +00:00
bevy_ui	Fix panic when using image in UiMaterial (#10591 )	2023-11-16 21:31:25 +00:00
bevy_utils	Release 0.12 (#10362 )	2023-11-04 17:24:23 +00:00
bevy_window	Release 0.12 (#10362 )	2023-11-04 17:24:23 +00:00
bevy_winit	Revert `App::run()` behavior/Remove `winit` specific code from `bevy_app` (#10389 )	2023-11-16 21:50:17 +00:00