![]() # Objective - Follow up on https://github.com/bevyengine/bevy/pull/10519, diving deeper into optimising `Entity` due to the `derive`d `PartialOrd` `partial_cmp` not being optimal with codegen: https://github.com/rust-lang/rust/issues/106107 - Fixes #2346. ## Solution Given the previous PR's solution and the other existing LLVM codegen bug, there seemed to be a potential further optimisation possible with `Entity`. In exploring providing manual `PartialOrd` impl, it turned out initially that the resulting codegen was not immediately better than the derived version. However, once `Entity` was given `#[repr(align(8)]`, the codegen improved remarkably, even more once the fields in `Entity` were rearranged to correspond to a `u64` layout (Rust doesn't automatically reorder fields correctly it seems). The field order and `align(8)` additions also improved `to_bits` codegen to be a single `mov` op. In turn, this led me to replace the previous "non-shortcircuiting" impl of `PartialEq::eq` to use direct `to_bits` comparison. The result was remarkably better codegen across the board, even for hastable lookups. The current baseline codegen is as follows: https://godbolt.org/z/zTW1h8PnY Assuming the following example struct that mirrors with the existing `Entity` definition: ```rust #[derive(Clone, Copy, Eq, PartialEq, PartialOrd, Ord)] pub struct FakeU64 { high: u32, low: u32, } ``` the output for `to_bits` is as follows: ``` example::FakeU64::to_bits: shl rdi, 32 mov eax, esi or rax, rdi ret ``` Changing the struct to: ```rust #[derive(Clone, Copy, Eq)] #[repr(align(8))] pub struct FakeU64 { low: u32, high: u32, } ``` and providing manual implementations for `PartialEq`/`PartialOrd`/`Ord`, `to_bits` now optimises to: ``` example::FakeU64::to_bits: mov rax, rdi ret ``` The full codegen example for this PR is here for reference: https://godbolt.org/z/n4Mjx165a To highlight, `gt` comparison goes from ``` example::greater_than: cmp edi, edx jae .LBB3_2 xor eax, eax ret .LBB3_2: setne dl cmp esi, ecx seta al or al, dl ret ``` to ``` example::greater_than: cmp rdi, rsi seta al ret ``` As explained on Discord by @scottmcm : >The root issue here, as far as I understand it, is that LLVM's middle-end is inexplicably unwilling to merge loads if that would make them under-aligned. It leaves that entirely up to its target-specific back-end, and thus a bunch of the things that you'd expect it to do that would fix this just don't happen. ## Benchmarks Before discussing benchmarks, everything was tested on the following specs: AMD Ryzen 7950X 16C/32T CPU 64GB 5200 RAM AMD RX7900XT 20GB Gfx card Manjaro KDE on Wayland I made use of the new entity hashing benchmarks to see how this PR would improve things there. With the changes in place, I first did an implementation keeping the existing "non shortcircuit" `PartialEq` implementation in place, but with the alignment and field ordering changes, which in the benchmark is the `ord_shortcircuit` column. The `to_bits` `PartialEq` implementation is the `ord_to_bits` column. The main_ord column is the current existing baseline from `main` branch.  My machine is not super set-up for benchmarking, so some results are within noise, but there's not just a clear improvement between the non-shortcircuiting implementation, but even further optimisation taking place with the `to_bits` implementation. On my machine, a fair number of the stress tests were not showing any difference (indicating other bottlenecks), but I was able to get a clear difference with `many_foxes` with a fox count of 10,000: Test with `cargo run --example many_foxes --features bevy/trace_tracy,wayland --release -- --count 10000`:  On avg, a framerate of about 28-29FPS was improved to 30-32FPS. "This trace" represents the current PR's perf, while "External trace" represents the `main` branch baseline. ## Changelog Changed: micro-optimized Entity align and field ordering as well as providing manual `PartialOrd`/`Ord` impls to help LLVM optimise further. ## Migration Guide Any `unsafe` code relying on field ordering of `Entity` or sufficiently cursed shenanigans should change to reflect the different internal representation and alignment requirements of `Entity`. Co-authored-by: james7132 <contact@jamessliu.com> Co-authored-by: NathanW <nathansward@comcast.net> |
||
---|---|---|
.. | ||
examples | ||
macros | ||
src | ||
Cargo.toml | ||
README.md |
Bevy ECS
What is Bevy ECS?
Bevy ECS is an Entity Component System custom-built for the Bevy game engine. It aims to be simple to use, ergonomic, fast, massively parallel, opinionated, and featureful. It was created specifically for Bevy's needs, but it can easily be used as a standalone crate in other projects.
ECS
All app logic in Bevy uses the Entity Component System paradigm, which is often shortened to ECS. ECS is a software pattern that involves breaking your program up into Entities, Components, and Systems. Entities are unique "things" that are assigned groups of Components, which are then processed using Systems.
For example, one entity might have a Position
and Velocity
component, whereas another entity might have a Position
and UI
component. You might have a movement system that runs on all entities with a Position and Velocity component.
The ECS pattern encourages clean, decoupled designs by forcing you to break up your app data and logic into its core components. It also helps make your code faster by optimizing memory access patterns and making parallelism easier.
Concepts
Bevy ECS is Bevy's implementation of the ECS pattern. Unlike other Rust ECS implementations, which often require complex lifetimes, traits, builder patterns, or macros, Bevy ECS uses normal Rust data types for all of these concepts:
Components
Components are normal Rust structs. They are data stored in a World
and specific instances of Components correlate to Entities.
use bevy_ecs::prelude::*;
#[derive(Component)]
struct Position { x: f32, y: f32 }
Worlds
Entities, Components, and Resources are stored in a World
. Worlds, much like Rust std collections like HashSet and Vec, expose operations to insert, read, write, and remove the data they store.
use bevy_ecs::world::World;
let world = World::default();
Entities
Entities are unique identifiers that correlate to zero or more Components.
use bevy_ecs::prelude::*;
#[derive(Component)]
struct Position { x: f32, y: f32 }
#[derive(Component)]
struct Velocity { x: f32, y: f32 }
let mut world = World::new();
let entity = world
.spawn((Position { x: 0.0, y: 0.0 }, Velocity { x: 1.0, y: 0.0 }))
.id();
let entity_ref = world.entity(entity);
let position = entity_ref.get::<Position>().unwrap();
let velocity = entity_ref.get::<Velocity>().unwrap();
Systems
Systems are normal Rust functions. Thanks to the Rust type system, Bevy ECS can use function parameter types to determine what data needs to be sent to the system. It also uses this "data access" information to determine what Systems can run in parallel with each other.
use bevy_ecs::prelude::*;
#[derive(Component)]
struct Position { x: f32, y: f32 }
fn print_position(query: Query<(Entity, &Position)>) {
for (entity, position) in &query {
println!("Entity {:?} is at position: x {}, y {}", entity, position.x, position.y);
}
}
Resources
Apps often require unique resources, such as asset collections, renderers, audio servers, time, etc. Bevy ECS makes this pattern a first class citizen. Resource
is a special kind of component that does not belong to any entity. Instead, it is identified uniquely by its type:
use bevy_ecs::prelude::*;
#[derive(Resource, Default)]
struct Time {
seconds: f32,
}
let mut world = World::new();
world.insert_resource(Time::default());
let time = world.get_resource::<Time>().unwrap();
// You can also access resources from Systems
fn print_time(time: Res<Time>) {
println!("{}", time.seconds);
}
The resources.rs
example illustrates how to read and write a Counter resource from Systems.
Schedules
Schedules run a set of Systems according to some execution strategy. Systems can be added to any number of System Sets, which are used to control their scheduling metadata.
The built in "parallel executor" considers dependencies between systems and (by default) run as many of them in parallel as possible. This maximizes performance, while keeping the system execution safe. To control the system ordering, define explicit dependencies between systems and their sets.
Using Bevy ECS
Bevy ECS should feel very natural for those familiar with Rust syntax:
use bevy_ecs::prelude::*;
#[derive(Component)]
struct Position { x: f32, y: f32 }
#[derive(Component)]
struct Velocity { x: f32, y: f32 }
// This system moves each entity with a Position and Velocity component
fn movement(mut query: Query<(&mut Position, &Velocity)>) {
for (mut position, velocity) in &mut query {
position.x += velocity.x;
position.y += velocity.y;
}
}
fn main() {
// Create a new empty World to hold our Entities and Components
let mut world = World::new();
// Spawn an entity with Position and Velocity components
world.spawn((
Position { x: 0.0, y: 0.0 },
Velocity { x: 1.0, y: 0.0 },
));
// Create a new Schedule, which defines an execution strategy for Systems
let mut schedule = Schedule::default();
// Add our system to the schedule
schedule.add_systems(movement);
// Run the schedule once. If your app has a "loop", you would run this once per loop
schedule.run(&mut world);
}
Features
Query Filters
use bevy_ecs::prelude::*;
#[derive(Component)]
struct Position { x: f32, y: f32 }
#[derive(Component)]
struct Player;
#[derive(Component)]
struct Alive;
// Gets the Position component of all Entities with Player component and without the Alive
// component.
fn system(query: Query<&Position, (With<Player>, Without<Alive>)>) {
for position in &query {
}
}
Change Detection
Bevy ECS tracks all changes to Components and Resources.
Queries can filter for changed Components:
use bevy_ecs::prelude::*;
#[derive(Component)]
struct Position { x: f32, y: f32 }
#[derive(Component)]
struct Velocity { x: f32, y: f32 }
// Gets the Position component of all Entities whose Velocity has changed since the last run of the System
fn system_changed(query: Query<&Position, Changed<Velocity>>) {
for position in &query {
}
}
// Gets the Position component of all Entities that had a Velocity component added since the last run of the System
fn system_added(query: Query<&Position, Added<Velocity>>) {
for position in &query {
}
}
Resources also expose change state:
use bevy_ecs::prelude::*;
#[derive(Resource)]
struct Time(f32);
// Prints "time changed!" if the Time resource has changed since the last run of the System
fn system(time: Res<Time>) {
if time.is_changed() {
println!("time changed!");
}
}
The change_detection.rs
example shows how to query only for updated entities and react on changes in resources.
Component Storage
Bevy ECS supports multiple component storage types.
Components can be stored in:
- Tables: Fast and cache friendly iteration, but slower adding and removing of components. This is the default storage type.
- Sparse Sets: Fast adding and removing of components, but slower iteration.
Component storage types are configurable, and they default to table storage if the storage is not manually defined.
use bevy_ecs::prelude::*;
#[derive(Component)]
struct TableStoredComponent;
#[derive(Component)]
#[component(storage = "SparseSet")]
struct SparseStoredComponent;
Component Bundles
Define sets of Components that should be added together.
use bevy_ecs::prelude::*;
#[derive(Default, Component)]
struct Player;
#[derive(Default, Component)]
struct Position { x: f32, y: f32 }
#[derive(Default, Component)]
struct Velocity { x: f32, y: f32 }
#[derive(Bundle, Default)]
struct PlayerBundle {
player: Player,
position: Position,
velocity: Velocity,
}
let mut world = World::new();
// Spawn a new entity and insert the default PlayerBundle
world.spawn(PlayerBundle::default());
// Bundles play well with Rust's struct update syntax
world.spawn(PlayerBundle {
position: Position { x: 1.0, y: 1.0 },
..Default::default()
});
Events
Events offer a communication channel between one or more systems. Events can be sent using the system parameter EventWriter
and received with EventReader
.
use bevy_ecs::prelude::*;
#[derive(Event)]
struct MyEvent {
message: String,
}
fn writer(mut writer: EventWriter<MyEvent>) {
writer.send(MyEvent {
message: "hello!".to_string(),
});
}
fn reader(mut reader: EventReader<MyEvent>) {
for event in reader.iter() {
}
}
A minimal set up using events can be seen in events.rs
.