forestia/bevy

Author	SHA1	Message	Date
Mike	e1b0bbf5ed	Stageless: add a method to scope to always run a task on the scope thread (#7415 ) # Objective - Currently exclusive systems and applying buffers run outside of the multithreaded executor and just calls the funtions on the thread the schedule is running on. Stageless changes this to run these using tasks in a scope. Specifically It uses `spawn_on_scope` to run these. For the render thread this is incorrect as calling `spawn_on_scope` there runs tasks on the main thread. It should instead run these on the render thread and only run nonsend systems on the main thread. ## Solution - Add another executor to `Scope` for spawning tasks on the scope. `spawn_on_scope` now always runs the task on the thread the scope is running on. `spawn_on_external` spawns onto the external executor than is optionally passed in. If None is passed `spawn_on_external` will spawn onto the scope executor. - Eventually this new machinery will be able to be removed. This will happen once a fix for removing NonSend resources from the world lands. So this is a temporary fix to support stageless. --- ## Changelog - add a spawn_on_external method to allow spawning on the scope's thread or an external thread ## Migration Guide > No migration guide. The main thread executor was introduced in pipelined rendering which was merged for 0.10. spawn_on_scope now behaves the same way as on 0.9.	2023-02-05 21:44:46 +00:00
Mike	4f3ed196fa	Stageless: move MainThreadExecutor to schedule_v3 (#7444 ) # Objective - Trying to move some of the fixes from https://github.com/bevyengine/bevy/pull/7267 to make that one easier to review - The MainThreadExecutor is how the render world runs nonsend systems on the main thread for pipelined rendering. - The multithread executor for stageless wasn't using the MainThreadExecutor. - MainThreadExecutor was declared in the old executor_parallel module that is getting deleted. - The way the MainThreadExecutor was getting passed to the scope was actually unsound as the resource could be dropped from the World while the schedule was running ## Solution - Move MainThreadExecutor to the new multithreaded_executor's file. - Make the multithreaded executor use the MainThreadExecutor - Clone the MainThreadExecutor onto the stack and pass that ref in ## Changelog - Move MainThreadExecutor for stageless migration.	2023-02-03 03:19:41 +00:00
Mike	2027af4c54	Pipelined Rendering (#6503 ) # Objective - Implement pipelined rendering - Fixes #5082 - Fixes #4718 ## User Facing Description Bevy now implements piplelined rendering! Pipelined rendering allows the app logic and rendering logic to run on different threads leading to large gains in performance. ![image](https://user-images.githubusercontent.com/2180432/202049871-3c00b801-58ab-448f-93fd-471e30aba55f.png) tracy capture of many_foxes example To use pipelined rendering, you just need to add the `PipelinedRenderingPlugin`. If you're using `DefaultPlugins` then it will automatically be added for you on all platforms except wasm. Bevy does not currently support multithreading on wasm which is needed for this feature to work. If you aren't using `DefaultPlugins` you can add the plugin manually. ```rust use bevy::prelude::; use bevy::render::pipelined_rendering::PipelinedRenderingPlugin; fn main() { App::new() // whatever other plugins you need .add_plugin(RenderPlugin) // needs to be added after RenderPlugin .add_plugin(PipelinedRenderingPlugin) .run(); } ``` If for some reason pipelined rendering needs to be removed. You can also disable the plugin the normal way. ```rust use bevy::prelude::; use bevy::render::pipelined_rendering::PipelinedRenderingPlugin; fn main() { App::new.add_plugins(DefaultPlugins.build().disable::<PipelinedRenderingPlugin>()); } ``` ### A setup function was added to plugins A optional plugin lifecycle function was added to the `Plugin trait`. This function is called after all plugins have been built, but before the app runner is called. This allows for some final setup to be done. In the case of pipelined rendering, the function removes the sub app from the main app and sends it to the render thread. ```rust struct MyPlugin; impl Plugin for MyPlugin { fn build(&self, app: &mut App) { } // optional function fn setup(&self, app: &mut App) { // do some final setup before runner is called } } ``` ### A Stage for Frame Pacing In the `RenderExtractApp` there is a stage labelled `BeforeIoAfterRenderStart` that systems can be added to. The specific use case for this stage is for a frame pacing system that can delay the start of main app processing in render bound apps to reduce input latency i.e. "frame pacing". This is not currently built into bevy, but exists as `bevy` ```text \|-------------------------------------------------------------------\| \| \| BeforeIoAfterRenderStart \| winit events \| main schedule \| \| extract \|---------------------------------------------------------\| \| \| extract commands \| rendering schedule \| \|-------------------------------------------------------------------\| ``` ### Small API additions * `Schedule::remove_stage` * `App::insert_sub_app` * `App::remove_sub_app` * `TaskPool::scope_with_executor` ## Problems and Solutions ### Moving render app to another thread Most of the hard bits for this were done with the render redo. This PR just sends the render app back and forth through channels which seems to work ok. I originally experimented with using a scope to run the render task. It was cuter, but that approach didn't allow render to start before i/o processing. So I switched to using channels. There is much complexity in the coordination that needs to be done, but it's worth it. By moving rendering during i/o processing the frame times should be much more consistent in render bound apps. See https://github.com/bevyengine/bevy/issues/4691. ### Unsoundness with Sending World with NonSend resources Dropping !Send things on threads other than the thread they were spawned on is considered unsound. The render world doesn't have any nonsend resources. So if we tell the users to "pretty please don't spawn nonsend resource on the render world", we can avoid this problem. More seriously there is this https://github.com/bevyengine/bevy/pull/6534 pr, which patches the unsoundness by aborting the app if a nonsend resource is dropped on the wrong thread. ~~That PR should probably be merged before this one.~~ For a longer term solution we have this discussion going https://github.com/bevyengine/bevy/discussions/6552. ### NonSend Systems in render world The render world doesn't have any !Send resources, but it does have a non send system. While Window is Send, winit does have some API's that can only be accessed on the main thread. `prepare_windows` in the render schedule thus needs to be scheduled on the main thread. Currently we run nonsend systems by running them on the thread the TaskPool::scope runs on. When we move render to another thread this no longer works. To fix this, a new `scope_with_executor` method was added that takes a optional `TheadExecutor` that can only be ticked on the thread it was initialized on. The render world then holds a `MainThreadExecutor` resource which can be passed to the scope in the parallel executor that it uses to spawn it's non send systems on. ### Scopes executors between render and main should not share tasks Since the render world and the app world share the `ComputeTaskPool`. Because `scope` has executors for the ComputeTaskPool a system from the main world could run on the render thread or a render system could run on the main thread. This can cause performance problems because it can delay a stage from finishing. See https://github.com/bevyengine/bevy/pull/6503#issuecomment-1309791442 for more details. To avoid this problem, `TaskPool::scope` has been changed to not tick the ComputeTaskPool when it's used by the parallel executor. In the future when we move closer to the 1 thread to 1 logical core model we may want to overprovide threads, because the render and main app threads don't do much when executing the schedule. ## Performance My machine is Windows 11, AMD Ryzen 5600x, RX 6600 ### Examples #### This PR with pipelining vs Main > Note that these were run on an older version of main and the performance profile has probably changed due to optimizations Seeing a perf gain from 29% on many lights to 7% on many sprites. <html> <body> <!--StartFragment--><google-sheets-html-origin> \| percent \| \| \| Diff \| \| \| Main \| \| \| PR \| \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- tracy frame time \| mean \| median \| sigma \| mean \| median \| sigma \| mean \| median \| sigma \| mean \| median \| sigma many foxes \| 27.01% \| 27.34% \| -47.09% \| 1.58 \| 1.55 \| -1.78 \| 5.85 \| 5.67 \| 3.78 \| 4.27 \| 4.12 \| 5.56 many lights \| 29.35% \| 29.94% \| -10.84% \| 3.02 \| 3.03 \| -0.57 \| 10.29 \| 10.12 \| 5.26 \| 7.27 \| 7.09 \| 5.83 many animated sprites \| 13.97% \| 15.69% \| 14.20% \| 3.79 \| 4.17 \| 1.41 \| 27.12 \| 26.57 \| 9.93 \| 23.33 \| 22.4 \| 8.52 3d scene \| 25.79% \| 26.78% \| 7.46% \| 0.49 \| 0.49 \| 0.15 \| 1.9 \| 1.83 \| 2.01 \| 1.41 \| 1.34 \| 1.86 many cubes \| 11.97% \| 11.28% \| 14.51% \| 1.93 \| 1.78 \| 1.31 \| 16.13 \| 15.78 \| 9.03 \| 14.2 \| 14 \| 7.72 many sprites \| 7.14% \| 9.42% \| -85.42% \| 1.72 \| 2.23 \| -6.15 \| 24.09 \| 23.68 \| 7.2 \| 22.37 \| 21.45 \| 13.35 <!--EndFragment--> </body> </html> #### This PR with pipelining disabled vs Main Mostly regressions here. I don't think this should be a problem as users that are disabling pipelined rendering are probably running single threaded and not using the parallel executor. The regression is probably mostly due to the switch to use `async_executor::run` instead of `try_tick` and also having one less thread to run systems on. I'll do a writeup on why switching to `run` causes regressions, so we can try to eventually fix it. Using try_tick causes issues when pipeline rendering is enable as seen [here](https://github.com/bevyengine/bevy/pull/6503#issuecomment-1380803518) <html> <body> <!--StartFragment--><google-sheets-html-origin> \| percent \| \| \| Diff \| \| \| Main \| \| \| PR no pipelining \| \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- tracy frame time \| mean \| median \| sigma \| mean \| median \| sigma \| mean \| median \| sigma \| mean \| median \| sigma many foxes \| -3.72% \| -4.42% \| -1.07% \| -0.21 \| -0.24 \| -0.04 \| 5.64 \| 5.43 \| 3.74 \| 5.85 \| 5.67 \| 3.78 many lights \| 0.29% \| -0.30% \| 4.75% \| 0.03 \| -0.03 \| 0.25 \| 10.29 \| 10.12 \| 5.26 \| 10.26 \| 10.15 \| 5.01 many animated sprites \| 0.22% \| 1.81% \| -2.72% \| 0.06 \| 0.48 \| -0.27 \| 27.12 \| 26.57 \| 9.93 \| 27.06 \| 26.09 \| 10.2 3d scene \| -15.79% \| -14.75% \| -31.34% \| -0.3 \| -0.27 \| -0.63 \| 1.9 \| 1.83 \| 2.01 \| 2.2 \| 2.1 \| 2.64 many cubes \| -2.85% \| -3.30% \| 0.00% \| -0.46 \| -0.52 \| 0 \| 16.13 \| 15.78 \| 9.03 \| 16.59 \| 16.3 \| 9.03 many sprites \| 2.49% \| 2.41% \| 0.69% \| 0.6 \| 0.57 \| 0.05 \| 24.09 \| 23.68 \| 7.2 \| 23.49 \| 23.11 \| 7.15 <!--EndFragment--> </body> </html> ### Benchmarks Mostly the same except empty_systems has got a touch slower. The maybe_pipelining+1 column has the compute task pool with an extra thread over default added. This is because pipelining loses one thread over main to execute systems on, since the main thread no longer runs normal systems. <details> <summary>Click Me</summary> ```text group main maybe-pipelining+1 ----- ------------------------- ------------------ busy_systems/01x_entities_03_systems 1.07 30.7±1.32µs ? ?/sec 1.00 28.6±1.35µs ? ?/sec busy_systems/01x_entities_06_systems 1.10 52.1±1.10µs ? ?/sec 1.00 47.2±1.08µs ? ?/sec busy_systems/01x_entities_09_systems 1.00 74.6±1.36µs ? ?/sec 1.00 75.0±1.93µs ? ?/sec busy_systems/01x_entities_12_systems 1.03 100.6±6.68µs ? ?/sec 1.00 98.0±1.46µs ? ?/sec busy_systems/01x_entities_15_systems 1.11 128.5±3.53µs ? ?/sec 1.00 115.5±1.02µs ? ?/sec busy_systems/02x_entities_03_systems 1.16 50.4±2.56µs ? ?/sec 1.00 43.5±3.00µs ? ?/sec busy_systems/02x_entities_06_systems 1.00 87.1±1.27µs ? ?/sec 1.05 91.5±7.15µs ? ?/sec busy_systems/02x_entities_09_systems 1.04 139.9±6.37µs ? ?/sec 1.00 134.0±1.06µs ? ?/sec busy_systems/02x_entities_12_systems 1.05 179.2±3.47µs ? ?/sec 1.00 170.1±3.17µs ? ?/sec busy_systems/02x_entities_15_systems 1.01 219.6±3.75µs ? ?/sec 1.00 218.1±2.55µs ? ?/sec busy_systems/03x_entities_03_systems 1.10 70.6±2.33µs ? ?/sec 1.00 64.3±0.69µs ? ?/sec busy_systems/03x_entities_06_systems 1.02 130.2±3.11µs ? ?/sec 1.00 128.0±1.34µs ? ?/sec busy_systems/03x_entities_09_systems 1.00 195.0±10.11µs ? ?/sec 1.00 194.8±1.41µs ? ?/sec busy_systems/03x_entities_12_systems 1.01 261.7±4.05µs ? ?/sec 1.00 259.8±4.11µs ? ?/sec busy_systems/03x_entities_15_systems 1.00 318.0±3.04µs ? ?/sec 1.06 338.3±20.25µs ? ?/sec busy_systems/04x_entities_03_systems 1.00 82.9±0.63µs ? ?/sec 1.02 84.3±0.63µs ? ?/sec busy_systems/04x_entities_06_systems 1.01 181.7±3.65µs ? ?/sec 1.00 179.8±1.76µs ? ?/sec busy_systems/04x_entities_09_systems 1.04 265.0±4.68µs ? ?/sec 1.00 255.3±1.98µs ? ?/sec busy_systems/04x_entities_12_systems 1.00 335.9±3.00µs ? ?/sec 1.05 352.6±15.84µs ? ?/sec busy_systems/04x_entities_15_systems 1.00 418.6±10.26µs ? ?/sec 1.08 450.2±39.58µs ? ?/sec busy_systems/05x_entities_03_systems 1.07 114.3±0.95µs ? ?/sec 1.00 106.9±1.52µs ? ?/sec busy_systems/05x_entities_06_systems 1.08 229.8±2.90µs ? ?/sec 1.00 212.3±4.18µs ? ?/sec busy_systems/05x_entities_09_systems 1.03 329.3±1.99µs ? ?/sec 1.00 319.2±2.43µs ? ?/sec busy_systems/05x_entities_12_systems 1.06 454.7±6.77µs ? ?/sec 1.00 430.1±3.58µs ? ?/sec busy_systems/05x_entities_15_systems 1.03 554.6±6.15µs ? ?/sec 1.00 538.4±23.87µs ? ?/sec contrived/01x_entities_03_systems 1.00 14.0±0.15µs ? ?/sec 1.08 15.1±0.21µs ? ?/sec contrived/01x_entities_06_systems 1.04 28.5±0.37µs ? ?/sec 1.00 27.4±0.44µs ? ?/sec contrived/01x_entities_09_systems 1.00 41.5±4.38µs ? ?/sec 1.02 42.2±2.24µs ? ?/sec contrived/01x_entities_12_systems 1.06 55.9±1.49µs ? ?/sec 1.00 52.6±1.36µs ? ?/sec contrived/01x_entities_15_systems 1.02 68.0±2.00µs ? ?/sec 1.00 66.5±0.78µs ? ?/sec contrived/02x_entities_03_systems 1.03 25.2±0.38µs ? ?/sec 1.00 24.6±0.52µs ? ?/sec contrived/02x_entities_06_systems 1.00 46.3±0.49µs ? ?/sec 1.04 48.1±4.13µs ? ?/sec contrived/02x_entities_09_systems 1.02 70.4±0.99µs ? ?/sec 1.00 68.8±1.04µs ? ?/sec contrived/02x_entities_12_systems 1.06 96.8±1.49µs ? ?/sec 1.00 91.5±0.93µs ? ?/sec contrived/02x_entities_15_systems 1.02 116.2±0.95µs ? ?/sec 1.00 114.2±1.42µs ? ?/sec contrived/03x_entities_03_systems 1.00 33.2±0.38µs ? ?/sec 1.01 33.6±0.45µs ? ?/sec contrived/03x_entities_06_systems 1.00 62.4±0.73µs ? ?/sec 1.01 63.3±1.05µs ? ?/sec contrived/03x_entities_09_systems 1.02 96.4±0.85µs ? ?/sec 1.00 94.8±3.02µs ? ?/sec contrived/03x_entities_12_systems 1.01 126.3±4.67µs ? ?/sec 1.00 125.6±2.27µs ? ?/sec contrived/03x_entities_15_systems 1.03 160.2±9.37µs ? ?/sec 1.00 156.0±1.53µs ? ?/sec contrived/04x_entities_03_systems 1.02 41.4±3.39µs ? ?/sec 1.00 40.5±0.52µs ? ?/sec contrived/04x_entities_06_systems 1.00 78.9±1.61µs ? ?/sec 1.02 80.3±1.06µs ? ?/sec contrived/04x_entities_09_systems 1.02 121.8±3.97µs ? ?/sec 1.00 119.2±1.46µs ? ?/sec contrived/04x_entities_12_systems 1.00 157.8±1.48µs ? ?/sec 1.01 160.1±1.72µs ? ?/sec contrived/04x_entities_15_systems 1.00 197.9±1.47µs ? ?/sec 1.08 214.2±34.61µs ? ?/sec contrived/05x_entities_03_systems 1.00 49.1±0.33µs ? ?/sec 1.01 49.7±0.75µs ? ?/sec contrived/05x_entities_06_systems 1.00 95.0±0.93µs ? ?/sec 1.00 94.6±0.94µs ? ?/sec contrived/05x_entities_09_systems 1.01 143.2±1.68µs ? ?/sec 1.00 142.2±2.00µs ? ?/sec contrived/05x_entities_12_systems 1.00 191.8±2.03µs ? ?/sec 1.01 192.7±7.88µs ? ?/sec contrived/05x_entities_15_systems 1.02 239.7±3.71µs ? ?/sec 1.00 235.8±4.11µs ? ?/sec empty_systems/000_systems 1.01 47.8±0.67ns ? ?/sec 1.00 47.5±2.02ns ? ?/sec empty_systems/001_systems 1.00 1743.2±126.14ns ? ?/sec 1.01 1761.1±70.10ns ? ?/sec empty_systems/002_systems 1.01 2.2±0.04µs ? ?/sec 1.00 2.2±0.02µs ? ?/sec empty_systems/003_systems 1.02 2.7±0.09µs ? ?/sec 1.00 2.7±0.16µs ? ?/sec empty_systems/004_systems 1.00 3.1±0.11µs ? ?/sec 1.00 3.1±0.24µs ? ?/sec empty_systems/005_systems 1.00 3.5±0.05µs ? ?/sec 1.11 3.9±0.70µs ? ?/sec empty_systems/010_systems 1.00 5.5±0.12µs ? ?/sec 1.03 5.7±0.17µs ? ?/sec empty_systems/015_systems 1.00 7.9±0.19µs ? ?/sec 1.06 8.4±0.16µs ? ?/sec empty_systems/020_systems 1.00 10.4±1.25µs ? ?/sec 1.02 10.6±0.18µs ? ?/sec empty_systems/025_systems 1.00 12.4±0.39µs ? ?/sec 1.14 14.1±1.07µs ? ?/sec empty_systems/030_systems 1.00 15.1±0.39µs ? ?/sec 1.05 15.8±0.62µs ? ?/sec empty_systems/035_systems 1.00 16.9±0.47µs ? ?/sec 1.07 18.0±0.37µs ? ?/sec empty_systems/040_systems 1.00 19.3±0.41µs ? ?/sec 1.05 20.3±0.39µs ? ?/sec empty_systems/045_systems 1.00 22.4±1.67µs ? ?/sec 1.02 22.9±0.51µs ? ?/sec empty_systems/050_systems 1.00 24.4±1.67µs ? ?/sec 1.01 24.7±0.40µs ? ?/sec empty_systems/055_systems 1.05 28.6±5.27µs ? ?/sec 1.00 27.2±0.70µs ? ?/sec empty_systems/060_systems 1.02 29.9±1.64µs ? ?/sec 1.00 29.3±0.66µs ? ?/sec empty_systems/065_systems 1.02 32.7±3.15µs ? ?/sec 1.00 32.1±0.98µs ? ?/sec empty_systems/070_systems 1.00 33.0±1.42µs ? ?/sec 1.03 34.1±1.44µs ? ?/sec empty_systems/075_systems 1.00 34.8±0.89µs ? ?/sec 1.04 36.2±0.70µs ? ?/sec empty_systems/080_systems 1.00 37.0±1.82µs ? ?/sec 1.05 38.7±1.37µs ? ?/sec empty_systems/085_systems 1.00 38.7±0.76µs ? ?/sec 1.05 40.8±0.83µs ? ?/sec empty_systems/090_systems 1.00 41.5±1.09µs ? ?/sec 1.04 43.2±0.82µs ? ?/sec empty_systems/095_systems 1.00 43.6±1.10µs ? ?/sec 1.04 45.2±0.99µs ? ?/sec empty_systems/100_systems 1.00 46.7±2.27µs ? ?/sec 1.03 48.1±1.25µs ? ?/sec ``` </details> ## Migration Guide ### App `runner` and SubApp `extract` functions are now required to be Send This was changed to enable pipelined rendering. If this breaks your use case please report it as these new bounds might be able to be relaxed. ## ToDo * [x] redo benchmarking * [x] reinvestigate the perf of the try_tick -> run change for task pool scope	2023-01-19 23:45:46 +00:00
Jakob Hellermann	e71c4d2802	fix nightly clippy warnings (#6395 ) # Objective - fix new clippy lints before they get stable and break CI ## Solution - run `clippy --fix` to auto-fix machine-applicable lints - silence `clippy::should_implement_trait` for `fn HandleId::default<T: Asset>` ## Changes - always prefer `format!("{inline}")` over `format!("{}", not_inline)` - prefer `Box::default` (or `Box::<T>::default` if necessary) over `Box::new(T::default())`	2022-10-28 21:03:01 +00:00
Mike	d22d310ad5	Nested spawns on scope (#4466 ) # Objective - Add ability to create nested spawns. This is needed for stageless. The current executor spawns tasks for each system early and runs the system by communicating through a channel. In stageless we want to spawn the task late, so that archetypes can be updated right before the task is run. The executor is run on a separate task, so this enables the scope to be passed to the spawned executor. - Fixes #4301 ## Solution - Instantiate a single threaded executor on the scope and use that instead of the LocalExecutor. This allows the scope to be Send, but still able to spawn tasks onto the main thread the scope is run on. This works because while systems can access nonsend data. The systems themselves are Send. Because of this change we lose the ability to spawn nonsend tasks on the scope, but I don't think this is being used anywhere. Users would still be able to use spawn_local on TaskPools. - Steals the lifetime tricks the `std:🧵:scope` uses to allow nested spawns, but disallow scope to be passed to tasks or threads not associated with the scope. - Change the storage for the tasks to a `ConcurrentQueue`. This is to allow a &Scope to be passed for spawning instead of a &mut Scope. `ConcurrentQueue` was chosen because it was already in our dependency tree because `async_executor` depends on it. - removed the optimizations for 0 and 1 spawned tasks. It did improve those cases, but made the cases of more than 1 task slower. --- ## Changelog Add ability to nest spawns ```rust fn main() { let pool = TaskPool::new(); pool.scope(\|scope\| { scope.spawn(async move { // calling scope.spawn from an spawn task was not possible before scope.spawn(async move { // do something }); }); }) } ``` ## Migration Guide If you were using explicit lifetimes and Passing Scope you'll need to specify two lifetimes now. ```rust fn scoped_function<'scope>(scope: &mut Scope<'scope, ()>) {} // should become fn scoped_function<'scope>(scope: &Scope<'_, 'scope, ()>) {} ``` `scope.spawn_local` changed to `scope.spawn_on_scope` this should cover cases where you needed to run tasks on the local thread, but does not cover spawning Nonsend Futures. ## TODO * [x] think real hard about all the lifetimes * [x] add doc about what 'env and 'scope mean. * [x] manually check that the single threaded task pool still works * [x] Get updated perf numbers * [x] check and make sure all the transmutes are necessary * [x] move commented out test into a compile fail test * [x] look through the tests for scope on std and see if I should add any more tests Co-authored-by: Michael Hsu <myhsu@benjaminelectric.com> Co-authored-by: Carter Anderson <mcanders1@gmail.com>	2022-09-28 01:59:10 +00:00
Carter Anderson	dc3f801239	Exclusive Systems Now Implement `System`. Flexible Exclusive System Params (#6083 ) # Objective The [Stageless RFC](https://github.com/bevyengine/rfcs/pull/45) involves allowing exclusive systems to be referenced and ordered relative to parallel systems. We've agreed that unifying systems under `System` is the right move. This is an alternative to #4166 (see rationale in the comments I left there). Note that this builds on the learnings established there (and borrows some patterns). ## Solution This unifies parallel and exclusive systems under the shared `System` trait, removing the old `ExclusiveSystem` trait / impls. This is accomplished by adding a new `ExclusiveFunctionSystem` impl similar to `FunctionSystem`. It is backed by `ExclusiveSystemParam`, which is similar to `SystemParam`. There is a new flattened out SystemContainer api (which cuts out a lot of trait and type complexity). This means you can remove all cases of `exclusive_system()`: ```rust // before commands.add_system(some_system.exclusive_system()); // after commands.add_system(some_system); ``` I've also implemented `ExclusiveSystemParam` for `&mut QueryState` and `&mut SystemState`, which makes this possible in exclusive systems: ```rust fn some_exclusive_system( world: &mut World, transforms: &mut QueryState<&Transform>, state: &mut SystemState<(Res<Time>, Query<&Player>)>, ) { for transform in transforms.iter(world) { println!("{transform:?}"); } let (time, players) = state.get(world); for player in players.iter() { println!("{player:?}"); } } ``` Note that "exclusive function systems" assume `&mut World` is present (and the first param). I think this is a fair assumption, given that the presence of `&mut World` is what defines the need for an exclusive system. I added some targeted SystemParam `static` constraints, which removed the need for this: ``` rust fn some_exclusive_system(state: &mut SystemState<(Res<'static, Time>, Query<&'static Player>)>) {} ``` ## Related - #2923 - #3001 - #3946 ## Changelog - `ExclusiveSystem` trait (and implementations) has been removed in favor of sharing the `System` trait. - `ExclusiveFunctionSystem` and `ExclusiveSystemParam` were added, enabling flexible exclusive function systems - `&mut SystemState` and `&mut QueryState` now implement `ExclusiveSystemParam` - Exclusive and parallel System configuration is now done via a unified `SystemDescriptor`, `IntoSystemDescriptor`, and `SystemContainer` api. ## Migration Guide Calling `.exclusive_system()` is no longer required (or supported) for converting exclusive system functions to exclusive systems: ```rust // Old (0.8) app.add_system(some_exclusive_system.exclusive_system()); // New (0.9) app.add_system(some_exclusive_system); ``` Converting "normal" parallel systems to exclusive systems is done by calling the exclusive ordering apis: ```rust // Old (0.8) app.add_system(some_system.exclusive_system().at_end()); // New (0.9) app.add_system(some_system.at_end()); ``` Query state in exclusive systems can now be cached via ExclusiveSystemParams, which should be preferred for clarity and performance reasons: ```rust // Old (0.8) fn some_system(world: &mut World) { let mut transforms = world.query::<&Transform>(); for transform in transforms.iter(world) { } } // New (0.9) fn some_system(world: &mut World, transforms: &mut QueryState<&Transform>) { for transform in transforms.iter(world) { } } ```	2022-09-26 23:57:07 +00:00
Carter Anderson	01aedc8431	Spawn now takes a Bundle (#6054 ) # Objective Now that we can consolidate Bundles and Components under a single insert (thanks to #2975 and #6039), almost 100% of world spawns now look like `world.spawn().insert((Some, Tuple, Here))`. Spawning an entity without any components is an extremely uncommon pattern, so it makes sense to give spawn the "first class" ergonomic api. This consolidated api should be made consistent across all spawn apis (such as World and Commands). ## Solution All `spawn` apis (`World::spawn`, `Commands:;spawn`, `ChildBuilder::spawn`, and `WorldChildBuilder::spawn`) now accept a bundle as input: ```rust // before: commands .spawn() .insert((A, B, C)); world .spawn() .insert((A, B, C); // after commands.spawn((A, B, C)); world.spawn((A, B, C)); ``` All existing instances of `spawn_bundle` have been deprecated in favor of the new `spawn` api. A new `spawn_empty` has been added, replacing the old `spawn` api. By allowing `world.spawn(some_bundle)` to replace `world.spawn().insert(some_bundle)`, this opened the door to removing the initial entity allocation in the "empty" archetype / table done in `spawn()` (and subsequent move to the actual archetype in `.insert(some_bundle)`). This improves spawn performance by over 10%: ![image](https://user-images.githubusercontent.com/2694663/191627587-4ab2f949-4ccd-4231-80eb-80dd4d9ad6b9.png) To take this measurement, I added a new `world_spawn` benchmark. Unfortunately, optimizing `Commands::spawn` is slightly less trivial, as Commands expose the Entity id of spawned entities prior to actually spawning. Doing the optimization would (naively) require assurances that the `spawn(some_bundle)` command is applied before all other commands involving the entity (which would not necessarily be true, if memory serves). Optimizing `Commands::spawn` this way does feel possible, but it will require careful thought (and maybe some additional checks), which deserves its own PR. For now, it has the same performance characteristics of the current `Commands::spawn_bundle` on main. Note that 99% of this PR is simple renames and refactors. The only code that needs careful scrutiny is the new `World::spawn()` impl, which is relatively straightforward, but it has some new unsafe code (which re-uses battle tested BundlerSpawner code path). --- ## Changelog - All `spawn` apis (`World::spawn`, `Commands:;spawn`, `ChildBuilder::spawn`, and `WorldChildBuilder::spawn`) now accept a bundle as input - All instances of `spawn_bundle` have been deprecated in favor of the new `spawn` api - World and Commands now have `spawn_empty()`, which is equivalent to the old `spawn()` behavior. ## Migration Guide ```rust // Old (0.8): commands .spawn() .insert_bundle((A, B, C)); // New (0.9) commands.spawn((A, B, C)); // Old (0.8): commands.spawn_bundle((A, B, C)); // New (0.9) commands.spawn((A, B, C)); // Old (0.8): let entity = commands.spawn().id(); // New (0.9) let entity = commands.spawn_empty().id(); // Old (0.8) let entity = world.spawn().id(); // New (0.9) let entity = world.spawn_empty(); ```	2022-09-23 19:55:54 +00:00
Carter Anderson	cd15f0f5be	Accept Bundles for insert and remove. Deprecate insert/remove_bundle (#6039 ) # Objective Take advantage of the "impl Bundle for Component" changes in #2975 / add the follow up changes discussed there. ## Solution - Change `insert` and `remove` to accept a Bundle instead of a Component (for both Commands and World) - Deprecate `insert_bundle`, `remove_bundle`, and `remove_bundle_intersection` - Add `remove_intersection` --- ## Changelog - Change `insert` and `remove` now accept a Bundle instead of a Component (for both Commands and World) - `insert_bundle` and `remove_bundle` are deprecated ## Migration Guide Replace `insert_bundle` with `insert`: ```rust // Old (0.8) commands.spawn().insert_bundle(SomeBundle::default()); // New (0.9) commands.spawn().insert(SomeBundle::default()); ``` Replace `remove_bundle` with `remove`: ```rust // Old (0.8) commands.entity(some_entity).remove_bundle::<SomeBundle>(); // New (0.9) commands.entity(some_entity).remove::<SomeBundle>(); ``` Replace `remove_bundle_intersection` with `remove_intersection`: ```rust // Old (0.8) world.entity_mut(some_entity).remove_bundle_intersection::<SomeBundle>(); // New (0.9) world.entity_mut(some_entity).remove_intersection::<SomeBundle>(); ``` Consider consolidating as many operations as possible to improve ergonomics and cut down on archetype moves: ```rust // Old (0.8) commands.spawn() .insert_bundle(SomeBundle::default()) .insert(SomeComponent); // New (0.9) - Option 1 commands.spawn().insert(( SomeBundle::default(), SomeComponent, )) // New (0.9) - Option 2 commands.spawn_bundle(( SomeBundle::default(), SomeComponent, )) ``` ## Next Steps Consider changing `spawn` to accept a bundle and deprecate `spawn_bundle`.	2022-09-21 21:47:53 +00:00
James Liu	5d821fe1a7	Start running systems while prepare_systems is running (#4919 ) # Objective While using the ParallelExecutor, systems do not actually start until `prepare_systems` completes. In stages where there are large numbers of "empty" systems with very little work to do, this delay adds significant overhead, which can add up over many stages. ## Solution Immediately and synchronously signal the start of systems that can run without dependencies inside `prepare_systems` instead of waiting for the first executor iteration after `prepare_systems` completes. Any system that is dependent on them still cannot run until after `prepare_systems` completes, but there are a large number of unconstrained systems in the base engine where this is a general benefit in almost every case. ## Performance This change was tested against `many_foxes` in the default configuration. As this change is sensitive to the overhead around scheduling systems, the spans for measuring system timing, system overhead, and system commands were all commented out for these measurements. The median stage timings between `main` and this PR are as follows: \|stage\|main\|this PR\| \|:--\|:--\|:--\| \|First\|75.54 us\|61.61 us\| \|LoadAssets\|51.05 us\|42.32 us\| \|PreUpdate\|54.6 us\|55.56 us\| \|Update\|61.89 us\|51.5 us\| \|PostUpdate\|7.27 ms\|6.71 ms\| \|AssetEvents\|47.82 us\|35.95 us\| \|Last\|39.19 us\|37.71 us\| \|reserve_and_flush\|57.83 us\|48.2 us\| \|Extract\|1.41 ms\|1.28 ms\| \|Prepare\|554.49 us\|502.53 us\| \|Queue\|216.29 us\|207.51 us\| \|Sort\|67.03 us\|60.99 us\| \|Render\|1.73 ms\|1.58 ms\| \|Cleanup\|33.55 us\|30.76 us\| \|Clear Entities\|18.56 us\|17.05 us\| \|full frame\|11.9 ms\|10.91 ms\| For the first few stages, the benefit is small but cumulative over each. For PostUpdate in particular, this allows `parent_update` to run while prepare_systems is running, which is required for the animation and transform propagation systems, which dominate the time spent in the stage, but also frontloads the contention as the other "empty" systems are also running while `parent_update` is running. For Render, where there is just a single large exclusive system, the benefit comes from not waiting on a spuriously scheduled task on the task pool to kick off the system: it's immediately scheduled to run.	2022-09-13 19:28:13 +00:00
ira	992681b59b	Make `Resource` trait opt-in, requiring `#[derive(Resource)]` V2 (#5577 ) This PR description is an edited copy of #5007, written by @alice-i-cecile. # Objective Follow-up to https://github.com/bevyengine/bevy/pull/2254. The `Resource` trait currently has a blanket implementation for all types that meet its bounds. While ergonomic, this results in several drawbacks: * it is possible to make confusing, silent mistakes such as inserting a function pointer (Foo) rather than a value (Foo::Bar) as a resource * it is challenging to discover if a type is intended to be used as a resource * we cannot later add customization options (see the [RFC](https://github.com/bevyengine/rfcs/blob/main/rfcs/27-derive-component.md) for the equivalent choice for Component). * dependencies can use the same Rust type as a resource in invisibly conflicting ways * raw Rust types used as resources cannot preserve privacy appropriately, as anyone able to access that type can read and write to internal values * we cannot capture a definitive list of possible resources to display to users in an editor ## Notes to reviewers * Review this commit-by-commit; there's effectively no back-tracking and there's a lot of churn in some of these commits. ira: My commits are not as well organized :') * I've relaxed the bound on Local to Send + Sync + 'static: I don't think these concerns apply there, so this can keep things simple. Storing e.g. a u32 in a Local is fine, because there's a variable name attached explaining what it does. * I think this is a bad place for the Resource trait to live, but I've left it in place to make reviewing easier. IMO that's best tackled with https://github.com/bevyengine/bevy/issues/4981. ## Changelog `Resource` is no longer automatically implemented for all matching types. Instead, use the new `#[derive(Resource)]` macro. ## Migration Guide Add `#[derive(Resource)]` to all types you are using as a resource. If you are using a third party type as a resource, wrap it in a tuple struct to bypass orphan rules. Consider deriving `Deref` and `DerefMut` to improve ergonomics. `ClearColor` no longer implements `Component`. Using `ClearColor` as a component in 0.8 did nothing. Use the `ClearColorConfig` in the `Camera3d` and `Camera2d` components instead. Co-authored-by: Alice <alice.i.cecile@gmail.com> Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com> Co-authored-by: devil-ira <justthecooldude@gmail.com> Co-authored-by: Carter Anderson <mcanders1@gmail.com>	2022-08-08 21:36:35 +00:00
ira	4847f7e3ad	Update codebase to use `IntoIterator` where possible. (#5269 ) Remove unnecessary calls to `iter()`/`iter_mut()`. Mainly updates the use of queries in our code, docs, and examples. ```rust // From for _ in list.iter() { for _ in list.iter_mut() { // To for _ in &list { for _ in &mut list { ``` We already enable the pedantic lint [clippy::explicit_iter_loop](https://rust-lang.github.io/rust-clippy/stable/) inside of Bevy. However, this only warns for a few known types from the standard library. ## Note for reviewers As you can see the additions and deletions are exactly equal. Maybe give it a quick skim to check I didn't sneak in a crypto miner, but you don't have to torture yourself by reading every line. I already experienced enough pain making this PR :) Co-authored-by: devil-ira <justthecooldude@gmail.com>	2022-07-11 15:28:50 +00:00
Jakob Hellermann	d38a8dfdd7	add more `SAFETY` comments and lint for missing ones in `bevy_ecs` (#4835 ) # Objective `SAFETY` comments are meant to be placed before `unsafe` blocks and should contain the reasoning of why in this case the usage of unsafe is okay. This is useful when reading the code because it makes it clear which assumptions are required for safety, and makes it easier to spot possible unsoundness holes. It also forces the code writer to think of something to write and maybe look at the safety contracts of any called unsafe methods again to double-check their correct usage. There's a clippy lint called `undocumented_unsafe_blocks` which warns when using a block without such a comment. ## Solution - since clippy expects `SAFETY` instead of `SAFE`, rename those - add `SAFETY` comments in more places - for the last remaining 3 places, add an `#[allow()]` and `// TODO` since I wasn't comfortable enough with the code to justify their safety - add ` #![warn(clippy::undocumented_unsafe_blocks)]` to `bevy_ecs` ### Note for reviewers The first commit only renames `SAFETY` to `SAFE` so it doesn't need a thorough review. `cb042a416e..55cef2d6fa` is the diff for all other changes. ### Safety comments where I'm not too familiar with the code `774012ece5/crates/bevy_ecs/src/entity/mod.rs (L540-L546)` `774012ece5/crates/bevy_ecs/src/world/entity_ref.rs (L249-L252)` ### Locations left undocumented with a `TODO` comment `5dde944a30/crates/bevy_ecs/src/schedule/executor_parallel.rs (L196-L199)` `5dde944a30/crates/bevy_ecs/src/world/entity_ref.rs (L287-L289)` `5dde944a30/crates/bevy_ecs/src/world/entity_ref.rs (L413-L415)` Co-authored-by: Jakob Hellermann <hellermann@sipgate.de>	2022-07-04 14:44:24 +00:00
James Liu	012ae07dc8	Add global init and get accessors for all newtyped TaskPools (#2250 ) Right now, a direct reference to the target TaskPool is required to launch tasks on the pools, despite the three newtyped pools (AsyncComputeTaskPool, ComputeTaskPool, and IoTaskPool) effectively acting as global instances. The need to pass a TaskPool reference adds notable friction to spawning subtasks within existing tasks. Possible use cases for this may include chaining tasks within the same pool like spawning separate send/receive I/O tasks after waiting on a network connection to be established, or allowing cross-pool dependent tasks like starting dependent multi-frame computations following a long I/O load. Other task execution runtimes provide static access to spawning tasks (i.e. `tokio::spawn`), which is notably easier to use than the reference passing required by `bevy_tasks` right now. This PR makes does the following: * Adds `TaskPool::init` which initializes a `OnceCell`'ed with a provided TaskPool. Failing if the pool has already been initialized. Adds `TaskPool::get` which fetches the initialized global pool of the respective type or panics. This generally should not be an issue in normal Bevy use, as the pools are initialized before they are accessed. Updated default task pool initialization to either pull the global handles and save them as resources, or if they are already initialized, pull the a cloned global handle as the resource. This should make it notably easier to build more complex task hierarchies for dependent tasks. It should also make writing bevy-adjacent, but not strictly bevy-only plugin crates easier, as the global pools ensure it's all running on the same threads. One alternative considered is keeping a thread-local reference to the pool for all threads in each pool to enable the same `tokio::spawn` interface. This would spawn tasks on the same pool that a task is currently running in. However this potentially leads to potential footgun situations where long running blocking tasks run on `ComputeTaskPool`.	2022-06-09 02:43:24 +00:00
Torne Wuff	b1afe2dcca	Make `System` responsible for updating its own archetypes (#4115 ) # Objective - Make it possible to use `System`s outside of the scheduler/executor without having to define logic to track new archetypes and call `System::add_archetype()` for each. ## Solution - Replace `System::add_archetype(&Archetype)` with `System::update_archetypes(&World)`, making systems responsible for tracking their own most recent archetype generation the way that `SystemState` already does. This has minimal (or simplifying) effect on most of the code with the exception of `FunctionSystem`, which must now track the latest `ArchetypeGeneration` it saw instead of relying on the executor to do it. Co-authored-by: Carter Anderson <mcanders1@gmail.com>	2022-04-07 20:50:43 +00:00
Carter Anderson	de677dbfc9	Use more ergonomic span syntax (#4246 ) Tracing added support for "inline span entering", which cuts down on a lot of complexity: ```rust let span = info_span!("my_span").entered(); ``` This adapts our code to use this pattern where possible, and updates our docs to recommend it. This produces equivalent tracing behavior. Here is a side by side profile of "before" and "after" these changes. ![image](https://user-images.githubusercontent.com/2694663/158912137-b0aa6dc8-c603-425f-880f-6ccf5ad1b7ef.png)	2022-03-18 04:19:21 +00:00
Alice Cecile	557ab9897a	Make get_resource (and friends) infallible (#4047 ) # Objective - In the large majority of cases, users were calling `.unwrap()` immediately after `.get_resource`. - Attempting to add more helpful error messages here resulted in endless manual boilerplate (see #3899 and the linked PRs). ## Solution - Add an infallible variant named `.resource` and so on. - Use these infallible variants over `.get_resource().unwrap()` across the code base. ## Notes I did not provide equivalent methods on `WorldCell`, in favor of removing it entirely in #3939. ## Migration Guide Infallible variants of `.get_resource` have been added that implicitly panic, rather than needing to be unwrapped. Replace `world.get_resource::<Foo>().unwrap()` with `world.resource::<Foo>()`. ## Impact - `.unwrap` search results before: 1084 - `.unwrap` search results after: 942 - internal `unwrap_or_else` calls added: 4 - trivial unwrap calls removed from tests and code: 146 - uses of the new `try_get_resource` API: 11 - percentage of the time the unwrapping API was used internally: 93%	2022-02-27 22:37:18 +00:00
danieleades	d8974e7c3d	small and mostly pointless refactoring (#2934 ) What is says on the tin. This has got more to do with making `clippy` slightly more quiet than it does with changing anything that might greatly impact readability or performance. that said, deriving `Default` for a couple of structs is a nice easy win	2022-02-13 22:33:55 +00:00
Alice Cecile	bdbf626341	Implement init_resource for `Commands` and `World` (#3079 ) # Objective - Fixes #3078 - Fixes #1397 ## Solution - Implement Commands::init_resource. - Also implement for World, for consistency and to simplify internal structure. - While we're here, clean up some of the docs for Command and World resource modification.	2022-02-08 23:04:19 +00:00
François	1468211e2b	fix unreachable macro calls for rust 2021 (#3889 ) # Objective - It was decided in Rust 2021 to make macro like `panic` require a string literal to format instead of directly an object - `unreachable` was missed during the first pass but it was decided to go for it anyway now: https://github.com/rust-lang/rust/issues/92137#issuecomment-1019519285 - this is making Bevy CI fail now: https://github.com/bevyengine/bevy/runs/5102586734?check_suite_focus=true ## Solution - Fix calls to `unreachable`	2022-02-08 02:59:54 +00:00
bilsen	1f99363de9	Add &World as SystemParam (#2923 ) # Objective Make it possible to use `&World` as a system parameter ## Solution It seems like all the pieces were already in place, very simple impl Co-authored-by: Carter Anderson <mcanders1@gmail.com>	2022-02-03 23:43:25 +00:00
Michael Dorst	507441d96f	Fix `doc_markdown` lints in `bevy_ecs` (#3473 ) #3457 adds the `doc_markdown` clippy lint, which checks doc comments to make sure code identifiers are escaped with backticks. This causes a lot of lint errors, so this is one of a number of PR's that will fix those lint errors one crate at a time. This PR fixes lints in the `bevy_ecs` crate.	2022-01-06 00:43:37 +00:00
Mike	851b5939ce	add tracing spans for parallel executor and system overhead (#3416 ) This PR adds tracing spans for the parallel executor and system overhead. ![image](https://user-images.githubusercontent.com/2180432/147172747-b78026e3-1c30-4120-92c8-693c6f1564cd.png)	2021-12-23 19:03:44 +00:00
Daniel McNab	0ee4195fb0	Remove some superfluous unsafe code (#3297 ) # Objective - This `unsafe` is weird ## Solution - Don't use `unsafe` here Hopefully this isn't already in an open PR.	2021-12-11 22:58:46 +00:00
Carter Anderson	7dd92e72d4	More Bevy ECS schedule spans (#3281 ) Fills in some gaps we had in our Bevy ECS tracing spans: * Exclusive systems * System Commands (for `apply_buffers = true` cases) * System archetype updates * Parallel system execution prep	2021-12-08 23:43:03 +00:00
Christopher Durham	0887f41b58	Fix bevy_ecs::schedule::executor_parallel::system span management (#2905 ) # Objective - Fixes #2904 (see for context) ## Solution - Simply hoist span creation out of the threaded task - Confirmed to solve the issue locally Now all events have the full span parent tree up through `bevy_ecs::schedule::stage` all the way to `bevy_app::app::bevy_app` (and its parents in bevy-consumer code, if any).	2021-10-06 19:00:35 +00:00
Paweł Grabarz	07ed1d053e	Implement and require `#[derive(Component)]` on all component structs (#2254 ) This implements the most minimal variant of #1843 - a derive for marker trait. This is a prerequisite to more complicated features like statically defined storage type or opt-out component reflection. In order to make component struct's purpose explicit and avoid misuse, it must be annotated with `#[derive(Component)]` (manual impl is discouraged for compatibility). Right now this is just a marker trait, but in the future it might be expanded. Making this change early allows us to make further changes later without breaking backward compatibility for derive macro users. This already prevents a lot of issues, like using bundles in `insert` calls. Primitive types are no longer valid components as well. This can be easily worked around by adding newtype wrappers and deriving `Component` for them. One funny example of prevented bad code (from our own tests) is when an newtype struct or enum variant is used. Previously, it was possible to write `insert(Newtype)` instead of `insert(Newtype(value))`. That code compiled, because function pointers (in this case newtype struct constructor) implement `Send + Sync + 'static`, so we allowed them to be used as components. This is no longer the case and such invalid code will trigger a compile error. Co-authored-by: = <=> Co-authored-by: TheRawMeatball <therawmeatball@gmail.com> Co-authored-by: Carter Anderson <mcanders1@gmail.com>	2021-10-03 19:23:44 +00:00
François	b724a0f586	Down with the system! (#2496 ) # Objective - Remove all the `.system()` possible. - Check for remaining missing cases. ## Solution - Remove all `.system()`, fix compile errors - 32 calls to `.system()` remains, mostly internals, the few others should be removed after #2446	2021-07-27 23:42:36 +00:00
François	5c4909dbb2	update archetypes for run criterias (#2177 ) fixes #2000 archetypes were not updated for run criteria on a stage or on a system set	2021-07-13 22:12:21 +00:00
Paweł Grabarz	93cc7219bc	small ecs cleanup and remove_bundle drop bugfix (#2172 ) - simplified code around archetype generations a little bit, as the special case value is not actually needed - removed unnecessary UnsafeCell around pointer value that is never updated through shared references - fixed and added a test for correct drop behaviour when removing sparse components through remove_bundle command	2021-05-18 19:25:57 +00:00
François	6f7da027c7	Automatic System Spans (#2033 ) As mentioned in https://github.com/bevyengine/bevy/issues/2025#issuecomment-827867660, systems used to have spans by default. * add spans by default for every system executed * create folder if missing for feature `wgpu_trace`	2021-04-28 18:41:16 +00:00
Carter Anderson	b17f8a4bce	format comments (#1612 ) Uses the new unstable comment formatting features added to rustfmt.toml.	2021-03-11 00:27:30 +00:00
Carter Anderson	3a2a68852c	Bevy ECS V2 (#1525 ) # Bevy ECS V2 This is a rewrite of Bevy ECS (basically everything but the new executor/schedule, which are already awesome). The overall goal was to improve the performance and versatility of Bevy ECS. Here is a quick bulleted list of changes before we dive into the details: * Complete World rewrite * Multiple component storage types: * Tables: fast cache friendly iteration, slower add/removes (previously called Archetypes) * Sparse Sets: fast add/remove, slower iteration * Stateful Queries (caches query results for faster iteration. fragmented iteration is _fast_ now) * Stateful System Params (caches expensive operations. inspired by @DJMcNab's work in #1364) * Configurable System Params (users can set configuration when they construct their systems. once again inspired by @DJMcNab's work) * Archetypes are now "just metadata", component storage is separate * Archetype Graph (for faster archetype changes) * Component Metadata * Configure component storage type * Retrieve information about component size/type/name/layout/send-ness/etc * Components are uniquely identified by a densely packed ComponentId * TypeIds are now totally optional (which should make implementing scripting easier) * Super fast "for_each" query iterators * Merged Resources into World. Resources are now just a special type of component * EntityRef/EntityMut builder apis (more efficient and more ergonomic) * Fast bitset-backed `Access<T>` replaces old hashmap-based approach everywhere * Query conflicts are determined by component access instead of archetype component access (to avoid random failures at runtime) * With/Without are still taken into account for conflicts, so this should still be comfy to use * Much simpler `IntoSystem` impl * Significantly reduced the amount of hashing throughout the ecs in favor of Sparse Sets (indexed by densely packed ArchetypeId, ComponentId, BundleId, and TableId) * Safety Improvements * Entity reservation uses a normal world reference instead of unsafe transmute * QuerySets no longer transmute lifetimes * Made traits "unsafe" where relevant * More thorough safety docs * WorldCell * Exposes safe mutable access to multiple resources at a time in a World * Replaced "catch all" `System::update_archetypes(world: &World)` with `System::new_archetype(archetype: &Archetype)` * Simpler Bundle implementation * Replaced slow "remove_bundle_one_by_one" used as fallback for Commands::remove_bundle with fast "remove_bundle_intersection" * Removed `Mut<T>` query impl. it is better to only support one way: `&mut T` * Removed with() from `Flags<T>` in favor of `Option<Flags<T>>`, which allows querying for flags to be "filtered" by default * Components now have is_send property (currently only resources support non-send) * More granular module organization * New `RemovedComponents<T>` SystemParam that replaces `query.removed::<T>()` * `world.resource_scope()` for mutable access to resources and world at the same time * WorldQuery and QueryFilter traits unified. FilterFetch trait added to enable "short circuit" filtering. Auto impled for cases that don't need it * Significantly slimmed down SystemState in favor of individual SystemParam state * System Commands changed from `commands: &mut Commands` back to `mut commands: Commands` (to allow Commands to have a World reference) Fixes #1320 ## `World` Rewrite This is a from-scratch rewrite of `World` that fills the niche that `hecs` used to. Yes, this means Bevy ECS is no longer a "fork" of hecs. We're going out our own! (the only shared code between the projects is the entity id allocator, which is already basically ideal) A huge shout out to @SanderMertens (author of [flecs](https://github.com/SanderMertens/flecs)) for sharing some great ideas with me (specifically hybrid ecs storage and archetype graphs). He also helped advise on a number of implementation details. ## Component Storage (The Problem) Two ECS storage paradigms have gained a lot of traction over the years: * Archetypal ECS: * Stores components in "tables" with static schemas. Each "column" stores components of a given type. Each "row" is an entity. * Each "archetype" has its own table. Adding/removing an entity's component changes the archetype. * Enables super-fast Query iteration due to its cache-friendly data layout * Comes at the cost of more expensive add/remove operations for an Entity's components, because all components need to be copied to the new archetype's "table" * Sparse Set ECS: * Stores components of the same type in densely packed arrays, which are sparsely indexed by densely packed unsigned integers (Entity ids) * Query iteration is slower than Archetypal ECS because each entity's component could be at any position in the sparse set. This "random access" pattern isn't cache friendly. Additionally, there is an extra layer of indirection because you must first map the entity id to an index in the component array. * Adding/removing components is a cheap, constant time operation Bevy ECS V1, hecs, legion, flec, and Unity DOTS are all "archetypal ecs-es". I personally think "archetypal" storage is a good default for game engines. An entity's archetype doesn't need to change frequently in general, and it creates "fast by default" query iteration (which is a much more common operation). It is also "self optimizing". Users don't need to think about optimizing component layouts for iteration performance. It "just works" without any extra boilerplate. Shipyard and EnTT are "sparse set ecs-es". They employ "packing" as a way to work around the "suboptimal by default" iteration performance for specific sets of components. This helps, but I didn't think this was a good choice for a general purpose engine like Bevy because: 1. "packs" conflict with each other. If bevy decides to internally pack the Transform and GlobalTransform components, users are then blocked if they want to pack some custom component with Transform. 2. users need to take manual action to optimize Developers selecting an ECS framework are stuck with a hard choice. Select an "archetypal" framework with "fast iteration everywhere" but without the ability to cheaply add/remove components, or select a "sparse set" framework to cheaply add/remove components but with slower iteration performance. ## Hybrid Component Storage (The Solution) In Bevy ECS V2, we get to have our cake and eat it too. It now has _both_ of the component storage types above (and more can be added later if needed): * Tables (aka "archetypal" storage) * The default storage. If you don't configure anything, this is what you get * Fast iteration by default * Slower add/remove operations * Sparse Sets * Opt-in * Slower iteration * Faster add/remove operations These storage types complement each other perfectly. By default Query iteration is fast. If developers know that they want to add/remove a component at high frequencies, they can set the storage to "sparse set": ```rust world.register_component( ComponentDescriptor:🆕:<MyComponent>(StorageType::SparseSet) ).unwrap(); ``` ## Archetypes Archetypes are now "just metadata" ... they no longer store components directly. They do store: * The `ComponentId`s of each of the Archetype's components (and that component's storage type) * Archetypes are uniquely defined by their component layouts * For example: entities with "table" components `[A, B, C]` _and_ "sparse set" components `[D, E]` will always be in the same archetype. * The `TableId` associated with the archetype * For now each archetype has exactly one table (which can have no components), * There is a 1->Many relationship from Tables->Archetypes. A given table could have any number of archetype components stored in it: * Ex: an entity with "table storage" components `[A, B, C]` and "sparse set" components `[D, E]` will share the same `[A, B, C]` table as an entity with `[A, B, C]` table component and `[F]` sparse set components. * This 1->Many relationship is how we preserve fast "cache friendly" iteration performance when possible (more on this later) * A list of entities that are in the archetype and the row id of the table they are in * ArchetypeComponentIds * unique densely packed identifiers for (ArchetypeId, ComponentId) pairs * used by the schedule executor for cheap system access control * "Archetype Graph Edges" (see the next section) ## The "Archetype Graph" Archetype changes in Bevy (and a number of other archetypal ecs-es) have historically been expensive to compute. First, you need to allocate a new vector of the entity's current component ids, add or remove components based on the operation performed, sort it (to ensure it is order-independent), then hash it to find the archetype (if it exists). And thats all before we get to the _already_ expensive full copy of all components to the new table storage. The solution is to build a "graph" of archetypes to cache these results. @SanderMertens first exposed me to the idea (and he got it from @gjroelofs, who came up with it). They propose adding directed edges between archetypes for add/remove component operations. If `ComponentId`s are densely packed, you can use sparse sets to cheaply jump between archetypes. Bevy takes this one step further by using add/remove `Bundle` edges instead of `Component` edges. Bevy encourages the use of `Bundles` to group add/remove operations. This is largely for "clearer game logic" reasons, but it also helps cut down on the number of archetype changes required. `Bundles` now also have densely-packed `BundleId`s. This allows us to use a _single_ edge for each bundle operation (rather than needing to traverse N edges ... one for each component). Single component operations are also bundles, so this is strictly an improvement over a "component only" graph. As a result, an operation that used to be _heavy_ (both for allocations and compute) is now two dirt-cheap array lookups and zero allocations. ## Stateful Queries World queries are now stateful. This allows us to: 1. Cache archetype (and table) matches * This resolves another issue with (naive) archetypal ECS: query performance getting worse as the number of archetypes goes up (and fragmentation occurs). 2. Cache Fetch and Filter state * The expensive parts of fetch/filter operations (such as hashing the TypeId to find the ComponentId) now only happen once when the Query is first constructed 3. Incrementally build up state * When new archetypes are added, we only process the new archetypes (no need to rebuild state for old archetypes) As a result, the direct `World` query api now looks like this: ```rust let mut query = world.query::<(&A, &mut B)>(); for (a, mut b) in query.iter_mut(&mut world) { } ``` Requiring `World` to generate stateful queries (rather than letting the `QueryState` type be constructed separately) allows us to ensure that _all_ queries are properly initialized (and the relevant world state, such as ComponentIds). This enables QueryState to remove branches from its operations that check for initialization status (and also enables query.iter() to take an immutable world reference because it doesn't need to initialize anything in world). However in systems, this is a non-breaking change. State management is done internally by the relevant SystemParam. ## Stateful SystemParams Like Queries, `SystemParams` now also cache state. For example, `Query` system params store the "stateful query" state mentioned above. Commands store their internal `CommandQueue`. This means you can now safely use as many separate `Commands` parameters in your system as you want. `Local<T>` system params store their `T` value in their state (instead of in Resources). SystemParam state also enabled a significant slim-down of SystemState. It is much nicer to look at now. Per-SystemParam state naturally insulates us from an "aliased mut" class of errors we have hit in the past (ex: using multiple `Commands` system params). (credit goes to @DJMcNab for the initial idea and draft pr here #1364) ## Configurable SystemParams @DJMcNab also had the great idea to make SystemParams configurable. This allows users to provide some initial configuration / values for system parameters (when possible). Most SystemParams have no config (the config type is `()`), but the `Local<T>` param now supports user-provided parameters: ```rust fn foo(value: Local<usize>) { } app.add_system(foo.system().config(\|c\| c.0 = Some(10))); ``` ## Uber Fast "for_each" Query Iterators Developers now have the choice to use a fast "for_each" iterator, which yields ~1.5-3x iteration speed improvements for "fragmented iteration", and minor ~1.2x iteration speed improvements for unfragmented iteration. ```rust fn system(query: Query<(&A, &mut B)>) { // you now have the option to do this for a speed boost query.for_each_mut(\|(a, mut b)\| { }); // however normal iterators are still available for (a, mut b) in query.iter_mut() { } } ``` I think in most cases we should continue to encourage "normal" iterators as they are more flexible and more "rust idiomatic". But when that extra "oomf" is needed, it makes sense to use `for_each`. We should also consider using `for_each` for internal bevy systems to give our users a nice speed boost (but that should be a separate pr). ## Component Metadata `World` now has a `Components` collection, which is accessible via `world.components()`. This stores mappings from `ComponentId` to `ComponentInfo`, as well as `TypeId` to `ComponentId` mappings (where relevant). `ComponentInfo` stores information about the component, such as ComponentId, TypeId, memory layout, send-ness (currently limited to resources), and storage type. ## Significantly Cheaper `Access<T>` We used to use `TypeAccess<TypeId>` to manage read/write component/archetype-component access. This was expensive because TypeIds must be hashed and compared individually. The parallel executor got around this by "condensing" type ids into bitset-backed access types. This worked, but it had to be re-generated from the `TypeAccess<TypeId>`sources every time archetypes changed. This pr removes TypeAccess in favor of faster bitset access everywhere. We can do this thanks to the move to densely packed `ComponentId`s and `ArchetypeComponentId`s. ## Merged Resources into World Resources had a lot of redundant functionality with Components. They stored typed data, they had access control, they had unique ids, they were queryable via SystemParams, etc. In fact the _only_ major difference between them was that they were unique (and didn't correlate to an entity). Separate resources also had the downside of requiring a separate set of access controls, which meant the parallel executor needed to compare more bitsets per system and manage more state. I initially got the "separate resources" idea from `legion`. I think that design was motivated by the fact that it made the direct world query/resource lifetime interactions more manageable. It certainly made our lives easier when using Resources alongside hecs/bevy_ecs. However we already have a construct for safely and ergonomically managing in-world lifetimes: systems (which use `Access<T>` internally). This pr merges Resources into World: ```rust world.insert_resource(1); world.insert_resource(2.0); let a = world.get_resource::<i32>().unwrap(); let mut b = world.get_resource_mut::<f64>().unwrap(); b = 3.0; ``` Resources are now just a special kind of component. They have their own ComponentIds (and their own resource TypeId->ComponentId scope, so they don't conflict wit components of the same type). They are stored in a special "resource archetype", which stores components inside the archetype using a new `unique_components` sparse set (note that this sparse set could later be used to implement Tags). This allows us to keep the code size small by reusing existing datastructures (namely Column, Archetype, ComponentFlags, and ComponentInfo). This allows us the executor to use a single `Access<ArchetypeComponentId>` per system. It should also make scripting language integration easier. _But_ this merge did create problems for people directly interacting with `World`. What if you need mutable access to multiple resources at the same time? `world.get_resource_mut()` borrows World mutably! ## WorldCell WorldCell applies the `Access<ArchetypeComponentId>` concept to direct world access: ```rust let world_cell = world.cell(); let a = world_cell.get_resource_mut::<i32>().unwrap(); let b = world_cell.get_resource_mut::<f64>().unwrap(); ``` This adds cheap runtime checks (a sparse set lookup of `ArchetypeComponentId` and a counter) to ensure that world accesses do not conflict with each other. Each operation returns a `WorldBorrow<'w, T>` or `WorldBorrowMut<'w, T>` wrapper type, which will release the relevant ArchetypeComponentId resources when dropped. World caches the access sparse set (and only one cell can exist at a time), so `world.cell()` is a cheap operation. WorldCell does _not_ use atomic operations. It is non-send, does a mutable borrow of world to prevent other accesses, and uses a simple `Rc<RefCell<ArchetypeComponentAccess>>` wrapper in each WorldBorrow pointer. The api is currently limited to resource access, but it can and should be extended to queries / entity component access. ## Resource Scopes WorldCell does not yet support component queries, and even when it does there are sometimes legitimate reasons to want a mutable world ref _and_ a mutable resource ref (ex: bevy_render and bevy_scene both need this). In these cases we could always drop down to the unsafe `world.get_resource_unchecked_mut()`, but that is not ideal! Instead developers can use a "resource scope" ```rust world.resource_scope(\|world: &mut World, a: &mut A\| { }) ``` This temporarily removes the `A` resource from `World`, provides mutable pointers to both, and re-adds A to World when finished. Thanks to the move to ComponentIds/sparse sets, this is a cheap operation. If multiple resources are required, scopes can be nested. We could also consider adding a "resource tuple" to the api if this pattern becomes common and the boilerplate gets nasty. ## Query Conflicts Use ComponentId Instead of ArchetypeComponentId For safety reasons, systems cannot contain queries that conflict with each other without wrapping them in a QuerySet. On bevy `main`, we use ArchetypeComponentIds to determine conflicts. This is nice because it can take into account filters: ```rust // these queries will never conflict due to their filters fn filter_system(a: Query<&mut A, With<B>>, b: Query<&mut B, Without<B>>) { } ``` But it also has a significant downside: ```rust // these queries will not conflict _until_ an entity with A, B, and C is spawned fn maybe_conflicts_system(a: Query<(&mut A, &C)>, b: Query<(&mut A, &B)>) { } ``` The system above will panic at runtime if an entity with A, B, and C is spawned. This makes it hard to trust that your game logic will run without crashing. In this pr, I switched to using `ComponentId` instead. This _is_ more constraining. `maybe_conflicts_system` will now always fail, but it will do it consistently at startup. Naively, it would also _disallow_ `filter_system`, which would be a significant downgrade in usability. Bevy has a number of internal systems that rely on disjoint queries and I expect it to be a common pattern in userspace. To resolve this, I added a new `FilteredAccess<T>` type, which wraps `Access<T>` and adds with/without filters. If two `FilteredAccess` have with/without values that prove they are disjoint, they will no longer conflict. ## EntityRef / EntityMut World entity operations on `main` require that the user passes in an `entity` id to each operation: ```rust let entity = world.spawn((A, )); // create a new entity with A world.get::<A>(entity); world.insert(entity, (B, C)); world.insert_one(entity, D); ``` This means that each operation needs to look up the entity location / verify its validity. The initial spawn operation also requires a Bundle as input. This can be awkward when no components are required (or one component is required). These operations have been replaced by `EntityRef` and `EntityMut`, which are "builder-style" wrappers around world that provide read and read/write operations on a single, pre-validated entity: ```rust // spawn now takes no inputs and returns an EntityMut let entity = world.spawn() .insert(A) // insert a single component into the entity .insert_bundle((B, C)) // insert a bundle of components into the entity .id() // id returns the Entity id // Returns EntityMut (or panics if the entity does not exist) world.entity_mut(entity) .insert(D) .insert_bundle(SomeBundle::default()); { // returns EntityRef (or panics if the entity does not exist) let d = world.entity(entity) .get::<D>() // gets the D component .unwrap(); // world.get still exists for ergonomics let d = world.get::<D>(entity).unwrap(); } // These variants return Options if you want to check existence instead of panicing world.get_entity_mut(entity) .unwrap() .insert(E); if let Some(entity_ref) = world.get_entity(entity) { let d = entity_ref.get::<D>().unwrap(); } ``` This _does not_ affect the current Commands api or terminology. I think that should be a separate conversation as that is a much larger breaking change. ## Safety Improvements Entity reservation in Commands uses a normal world borrow instead of an unsafe transmute * QuerySets no longer transmutes lifetimes * Made traits "unsafe" when implementing a trait incorrectly could cause unsafety * More thorough safety docs ## RemovedComponents SystemParam The old approach to querying removed components: `query.removed:<T>()` was confusing because it had no connection to the query itself. I replaced it with the following, which is both clearer and allows us to cache the ComponentId mapping in the SystemParamState: ```rust fn system(removed: RemovedComponents<T>) { for entity in removed.iter() { } } ``` ## Simpler Bundle implementation Bundles are no longer responsible for sorting (or deduping) TypeInfo. They are just a simple ordered list of component types / data. This makes the implementation smaller and opens the door to an easy "nested bundle" implementation in the future (which i might even add in this pr). Duplicate detection is now done once per bundle type by World the first time a bundle is used. ## Unified WorldQuery and QueryFilter types (don't worry they are still separate type _parameters_ in Queries .. this is a non-breaking change) WorldQuery and QueryFilter were already basically identical apis. With the addition of `FetchState` and more storage-specific fetch methods, the overlap was even clearer (and the redundancy more painful). QueryFilters are now just `F: WorldQuery where F::Fetch: FilterFetch`. FilterFetch requires `Fetch<Item = bool>` and adds new "short circuit" variants of fetch methods. This enables a filter tuple like `(With<A>, Without<B>, Changed<C>)` to stop evaluating the filter after the first mismatch is encountered. FilterFetch is automatically implemented for `Fetch` implementations that return bool. This forces fetch implementations that return things like `(bool, bool, bool)` (such as the filter above) to manually implement FilterFetch and decide whether or not to short-circuit. ## More Granular Modules World no longer globs all of the internal modules together. It now exports `core`, `system`, and `schedule` separately. I'm also considering exporting `core` submodules directly as that is still pretty "glob-ey" and unorganized (feedback welcome here). ## Remaining Draft Work (to be done in this pr) * ~~panic on conflicting WorldQuery fetches (&A, &mut A)~~ * ~~bevy `main` and hecs both currently allow this, but we should protect against it if possible~~ * ~~batch_iter / par_iter (currently stubbed out)~~ * ~~ChangedRes~~ * ~~I skipped this while we sort out #1313. This pr should be adapted to account for whatever we land on there~~. * ~~The `Archetypes` and `Tables` collections use hashes of sorted lists of component ids to uniquely identify each archetype/table. This hash is then used as the key in a HashMap to look up the relevant ArchetypeId or TableId. (which doesn't handle hash collisions properly)~~ * ~~It is currently unsafe to generate a Query from "World A", then use it on "World B" (despite the api claiming it is safe). We should probably close this gap. This could be done by adding a randomly generated WorldId to each world, then storing that id in each Query. They could then be compared to each other on each `query.do_thing(&world)` operation. This _does_ add an extra branch to each query operation, so I'm open to other suggestions if people have them.~~ * ~~Nested Bundles (if i find time)~~ ## Potential Future Work * Expand WorldCell to support queries. * Consider not allocating in the empty archetype on `world.spawn()` * ex: return something like EntityMutUninit, which turns into EntityMut after an `insert` or `insert_bundle` op * this actually regressed performance last time i tried it, but in theory it should be faster * Optimize SparseSet::insert (see `PERF` comment on insert) * Replace SparseArray `Option<T>` with T::MAX to cut down on branching * would enable cheaper get_unchecked() operations * upstream fixedbitset optimizations * fixedbitset could be allocation free for small block counts (store blocks in a SmallVec) * fixedbitset could have a const constructor * Consider implementing Tags (archetype-specific by-value data that affects archetype identity) * ex: ArchetypeA could have `[A, B, C]` table components and `[D(1)]` "tag" component. ArchetypeB could have `[A, B, C]` table components and a `[D(2)]` tag component. The archetypes are different, despite both having D tags because the value inside D is different. * this could potentially build on top of the `archetype.unique_components` added in this pr for resource storage. * Consider reverting `all_tuples` proc macro in favor of the old `macro_rules` implementation * all_tuples is more flexible and produces cleaner documentation (the macro_rules version produces weird type parameter orders due to parser constraints) * but unfortunately all_tuples also appears to make Rust Analyzer sad/slow when working inside of `bevy_ecs` (does not affect user code) * Consider "resource queries" and/or "mixed resource and entity component queries" as an alternative to WorldCell * this is basically just "systems" so maybe it's not worth it * Add more world ops * `world.clear()` * `world.reserve<T: Bundle>(count: usize)` * Try using the old archetype allocation strategy (allocate new memory on resize and copy everything over). I expect this to improve batch insertion performance at the cost of unbatched performance. But thats just a guess. I'm not an allocation perf pro :) * Adapt Commands apis for consistency with new World apis ## Benchmarks key: * `bevy_old`: bevy `main` branch * `bevy`: this branch * `_foreach`: uses an optimized for_each iterator * ` _sparse`: uses sparse set storage (if unspecified assume table storage) * `_system`: runs inside a system (if unspecified assume test happens via direct world ops) ### Simple Insert (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245573-9c3ce100-7795-11eb-9003-bfd41cd5c51f.png) ### Simpler Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245795-ffc70e80-7795-11eb-92fb-3ffad09aabf7.png) ### Fragment Iter (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109245849-0fdeee00-7796-11eb-8d25-eb6b7a682c48.png) ### Sparse Fragmented Iter Iterate a query that matches 5 entities from a single matching archetype, but there are 100 unmatching archetypes ![image](https://user-images.githubusercontent.com/2694663/109245916-2b49f900-7796-11eb-9a8f-ed89c203f940.png) ### Schedule (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246428-1fab0200-7797-11eb-8841-1b2161e90fa4.png) ### Add Remove Component (from ecs_bench_suite) ![image](https://user-images.githubusercontent.com/2694663/109246492-39e4e000-7797-11eb-8985-2706bd0495ab.png) ### Add Remove Component Big Same as the test above, but each entity has 5 "large" matrix components and 1 "large" matrix component is added and removed ![image](https://user-images.githubusercontent.com/2694663/109246517-449f7500-7797-11eb-835e-28b6790daeaa.png) ### Get Component Looks up a single component value a large number of times ![image](https://user-images.githubusercontent.com/2694663/109246129-87ad1880-7796-11eb-9fcb-c38012aa7c70.png)	2021-03-05 07:54:35 +00:00
Alexander Sepity	d5a7330431	System sets and parallel executor v2 (#1144 ) System sets and parallel executor v2	2021-02-09 12:14:10 -08:00

33 Commits