Proper prehashing (#3963)

For some keys, it is too expensive to hash them on every lookup. Historically in Bevy, we have regrettably done the "wrong" thing in these cases (pre-computing hashes, then re-hashing them) because Rust's built in hashed collections don't give us the tools we need to do otherwise. Doing this is "wrong" because two different values can result in the same hash. Hashed collections generally get around this by falling back to equality checks on hash collisions. You can't do that if the key _is_ the hash. Additionally, re-hashing a hash increase the odds of collision! #3959 needs pre-hashing to be viable, so I decided to finally properly solve the problem. The solution involves two different changes: 1. A new generalized "pre-hashing" solution in bevy_utils: `Hashed<T>` types, which store a value alongside a pre-computed hash. And `PreHashMap<K, V>` (which uses `Hashed<T>` internally) . `PreHashMap` is just an alias for a normal HashMap that uses `Hashed<T>` as the key and a new `PassHash` implementation as the Hasher. 2. Replacing the `std::collections` re-exports in `bevy_utils` with equivalent `hashbrown` impls. Avoiding re-hashes requires the `raw_entry_mut` api, which isn't stabilized yet (and may never be ... `entry_ref` has favor now, but also isn't available yet). If std's HashMap ever provides the tools we need, we can move back to that. The latest version of `hashbrown` adds support for the `entity_ref` api, so we can move to that in preparation for an std migration, if thats the direction they seem to be going in. Note that adding hashbrown doesn't increase our dependency count because it was already in our tree. In addition to providing these core tools, I also ported the "table identity hashing" in `bevy_ecs` to `raw_entry_mut`, which was a particularly egregious case. The biggest outstanding case is `AssetPathId`, which stores a pre-hash. We need AssetPathId to be cheaply clone-able (and ideally Copy), but `Hashed<AssetPath>` requires ownership of the AssetPath, which makes cloning ids way more expensive. We could consider doing `Hashed<Arc<AssetPath>>`, but cloning an arc is still a non-trivial expensive that needs to be considered. I would like to handle this in a separate PR. And given that we will be re-evaluating the Bevy Assets implementation in the very near future, I'd prefer to hold off until after that conversation is concluded.
2022-02-18 03:26:01 +00:00 · 2022-02-18 03:26:01 +00:00 · b3a1db60f2
commit b3a1db60f2
parent c4f132afbf
12 changed files with 190 additions and 185 deletions
--- a/crates/bevy_asset/src/asset_server.rs
+++ b/crates/bevy_asset/src/asset_server.rs
@ -8,10 +8,10 @@ use anyhow::Result;
 use bevy_ecs::system::{Res, ResMut};
 use bevy_log::warn;
 use bevy_tasks::TaskPool;
-use bevy_utils::{HashMap, Uuid};
+use bevy_utils::{Entry, HashMap, Uuid};
 use crossbeam_channel::TryRecvError;
 use parking_lot::{Mutex, RwLock};
-use std::{collections::hash_map::Entry, path::Path, sync::Arc};
+use std::{path::Path, sync::Arc};
 use thiserror::Error;

 /// Errors that occur while loading assets with an `AssetServer`
--- a/crates/bevy_ecs/src/entity/map_entities.rs
+++ b/crates/bevy_ecs/src/entity/map_entities.rs
@ -1,6 +1,5 @@
 use crate::entity::Entity;
-use bevy_utils::HashMap;
-use std::collections::hash_map::Entry;
+use bevy_utils::{Entry, HashMap};
 use thiserror::Error;

 #[derive(Error, Debug)]
--- a/crates/bevy_ecs/src/storage/table.rs
+++ b/crates/bevy_ecs/src/storage/table.rs
@ -3,10 +3,9 @@ use crate::{
    entity::Entity,
    storage::{BlobVec, SparseSet},
 };
-use bevy_utils::{AHasher, HashMap};
+use bevy_utils::HashMap;
 use std::{
    cell::UnsafeCell,
-    hash::{Hash, Hasher},
    ops::{Index, IndexMut},
    ptr::NonNull,
 };
@ -415,7 +414,7 @@ impl Table {
 /// Can be accessed via [`Storages`](crate::storage::Storages)
 pub struct Tables {
    tables: Vec<Table>,
-    table_ids: HashMap<u64, TableId>,
+    table_ids: HashMap<Vec<ComponentId>, TableId>,
 }

 impl Default for Tables {
@ -472,18 +471,21 @@ impl Tables {
        component_ids: &[ComponentId],
        components: &Components,
    ) -> TableId {
-        let mut hasher = AHasher::default();
-        component_ids.hash(&mut hasher);
-        let hash = hasher.finish();
        let tables = &mut self.tables;
-        *self.table_ids.entry(hash).or_insert_with(move || {
-            let mut table = Table::with_capacity(0, component_ids.len());
-            for component_id in component_ids.iter() {
-                table.add_column(components.get_info_unchecked(*component_id));
-            }
-            tables.push(table);
-            TableId(tables.len() - 1)
-        })
+        let (_key, value) = self
+            .table_ids
+            .raw_entry_mut()
+            .from_key(component_ids)
+            .or_insert_with(|| {
+                let mut table = Table::with_capacity(0, component_ids.len());
+                for component_id in component_ids.iter() {
+                    table.add_column(components.get_info_unchecked(*component_id));
+                }
+                tables.push(table);
+                (component_ids.to_vec(), TableId(tables.len() - 1))
+            });
+
+        *value
    }

    pub fn iter(&self) -> std::slice::Iter<'_, Table> {
--- a/crates/bevy_input/src/input.rs
+++ b/crates/bevy_input/src/input.rs
@ -29,13 +29,13 @@ use bevy_ecs::schedule::State;
 /// * Call the [`Input::release`] method for each release event.
 /// * Call the [`Input::clear`] method at each frame start, before processing events.
 #[derive(Debug, Clone)]
-pub struct Input<T> {
+pub struct Input<T: Eq + Hash> {
    pressed: HashSet<T>,
    just_pressed: HashSet<T>,
    just_released: HashSet<T>,
 }

-impl<T> Default for Input<T> {
+impl<T: Eq + Hash> Default for Input<T> {
    fn default() -> Self {
        Self {
            pressed: Default::default(),
--- a/crates/bevy_reflect/Cargo.toml
+++ b/crates/bevy_reflect/Cargo.toml
@ -25,6 +25,7 @@ thiserror = "1.0"
 serde = "1"
 smallvec = { version = "1.6", features = ["serde", "union", "const_generics"], optional = true }
 glam = { version = "0.20.0", features = ["serde"], optional = true }
+hashbrown = { version = "0.11", features = ["serde"], optional = true }

 [dev-dependencies]
 ron = "0.7.0"
--- a/crates/bevy_reflect/src/impls/std.rs
+++ b/crates/bevy_reflect/src/impls/std.rs
@ -6,7 +6,7 @@ use crate::{
 };

 use bevy_reflect_derive::{impl_from_reflect_value, impl_reflect_value};
-use bevy_utils::{AHashExt, Duration, HashMap, HashSet};
+use bevy_utils::{Duration, HashMap, HashSet};
 use serde::{Deserialize, Serialize};
 use std::{
    any::Any,
--- a/crates/bevy_reflect/src/map.rs
+++ b/crates/bevy_reflect/src/map.rs
@ -1,6 +1,6 @@
-use std::{any::Any, collections::hash_map::Entry};
+use std::any::Any;

-use bevy_utils::HashMap;
+use bevy_utils::{Entry, HashMap};

 use crate::{serde::Serializable, Reflect, ReflectMut, ReflectRef};

--- a/crates/bevy_reflect/src/struct_trait.rs
+++ b/crates/bevy_reflect/src/struct_trait.rs
@ -1,6 +1,6 @@
 use crate::{serde::Serializable, Reflect, ReflectMut, ReflectRef};
-use bevy_utils::HashMap;
-use std::{any::Any, borrow::Cow, collections::hash_map::Entry};
+use bevy_utils::{Entry, HashMap};
+use std::{any::Any, borrow::Cow};

 /// A reflected Rust regular struct type.
 ///
--- a/crates/bevy_render/src/render_resource/pipeline_cache.rs
+++ b/crates/bevy_render/src/render_resource/pipeline_cache.rs
@ -10,8 +10,8 @@ use crate::{
 use bevy_app::EventReader;
 use bevy_asset::{AssetEvent, Assets, Handle};
 use bevy_ecs::system::{Res, ResMut};
-use bevy_utils::{tracing::error, HashMap, HashSet};
-use std::{collections::hash_map::Entry, hash::Hash, ops::Deref, sync::Arc};
+use bevy_utils::{tracing::error, Entry, HashMap, HashSet};
+use std::{hash::Hash, ops::Deref, sync::Arc};
 use thiserror::Error;
 use wgpu::{PipelineLayoutDescriptor, ShaderModule, VertexBufferLayout};

--- a/crates/bevy_render/src/texture/texture_cache.rs
+++ b/crates/bevy_render/src/texture/texture_cache.rs
@ -3,7 +3,7 @@ use crate::{
    renderer::RenderDevice,
 };
 use bevy_ecs::prelude::ResMut;
-use bevy_utils::HashMap;
+use bevy_utils::{Entry, HashMap};
 use wgpu::{TextureDescriptor, TextureViewDescriptor};

 /// The internal representation of a [`CachedTexture`] used to track whether it was recently used
@ -39,7 +39,7 @@ impl TextureCache {
        descriptor: TextureDescriptor<'static>,
    ) -> CachedTexture {
        match self.textures.entry(descriptor) {
-            std::collections::hash_map::Entry::Occupied(mut entry) => {
+            Entry::Occupied(mut entry) => {
                for texture in entry.get_mut().iter_mut() {
                    if !texture.taken {
                        texture.frames_since_last_use = 0;
@ -64,7 +64,7 @@ impl TextureCache {
                    default_view,
                }
            }
-            std::collections::hash_map::Entry::Vacant(entry) => {
+            Entry::Vacant(entry) => {
                let texture = render_device.create_texture(entry.key());
                let default_view = texture.create_view(&TextureViewDescriptor::default());
                entry.insert(vec![CachedTextureMeta {
--- a/crates/bevy_utils/Cargo.toml
+++ b/crates/bevy_utils/Cargo.toml
@ -14,6 +14,7 @@ ahash = "0.7.0"
 tracing = {version = "0.1", features = ["release_max_level_info"]}
 instant = { version = "0.1", features = ["wasm-bindgen"] }
 uuid = { version = "0.8", features = ["v4", "serde"] }
+hashbrown = { version = "0.11", features = ["serde"] }

 [target.'cfg(target_arch = "wasm32")'.dependencies]
 getrandom = {version = "0.2.0", features = ["js"]}
--- a/crates/bevy_utils/src/lib.rs
+++ b/crates/bevy_utils/src/lib.rs
@ -3,12 +3,22 @@ pub mod label;

 pub use ahash::AHasher;
 pub use enum_variant_meta::*;
+pub type Entry<'a, K, V> = hashbrown::hash_map::Entry<'a, K, V, RandomState>;
+pub use hashbrown;
+use hashbrown::hash_map::RawEntryMut;
 pub use instant::{Duration, Instant};
 pub use tracing;
 pub use uuid::Uuid;

 use ahash::RandomState;
-use std::{future::Future, pin::Pin};
+use std::{
+    fmt::Debug,
+    future::Future,
+    hash::{BuildHasher, Hash, Hasher},
+    marker::PhantomData,
+    ops::Deref,
+    pin::Pin,
+};

 #[cfg(not(target_arch = "wasm32"))]
 pub type BoxedFuture<'a, T> = Pin<Box<dyn Future<Output = T> + Send + 'a>>;
@ -32,178 +42,170 @@ impl std::hash::BuildHasher for FixedState {
    }
 }

-/// A [`HashMap`][std::collections::HashMap] implementing [`aHash`], a high
+/// A [`HashMap`][hashbrown::HashMap] implementing aHash, a high
 /// speed keyed hashing algorithm intended for use in in-memory hashmaps.
 ///
-/// `aHash` is designed for performance and is NOT cryptographically secure.
-///
-/// # Construction
-///
-/// Users may be surprised when a `HashMap` cannot be constructed with `HashMap::new()`:
-///
-/// ```compile_fail
-/// # fn main() {
-/// use bevy_utils::HashMap;
-///
-/// // Produces an error like "no function or associated item named `new` found [...]"
-/// let map: HashMap<String, String> = HashMap::new();
-/// # }
-/// ```
-///
-/// The standard library's [`HashMap::new`][std::collections::HashMap::new] is
-/// implemented only for `HashMap`s which use the
-/// [`DefaultHasher`][std::collections::hash_map::DefaultHasher], so it's not
-/// available for Bevy's `HashMap`.
-///
-/// However, an empty `HashMap` can easily be constructed using the `Default`
-/// implementation:
-///
-/// ```
-/// # fn main() {
-/// use bevy_utils::HashMap;
-///
-/// // This works!
-/// let map: HashMap<String, String> = HashMap::default();
-/// assert!(map.is_empty());
-/// # }
-/// ```
-///
-/// [`aHash`]: https://github.com/tkaitchuck/aHash
-pub type HashMap<K, V> = std::collections::HashMap<K, V, RandomState>;
+/// aHash is designed for performance and is NOT cryptographically secure.
+pub type HashMap<K, V> = hashbrown::HashMap<K, V, RandomState>;

-pub trait AHashExt {
-    fn with_capacity(capacity: usize) -> Self;
-}
-
-impl<K, V> AHashExt for HashMap<K, V> {
-    /// Creates an empty `HashMap` with the specified capacity with aHash.
-    ///
-    /// The hash map will be able to hold at least `capacity` elements without
-    /// reallocating. If `capacity` is 0, the hash map will not allocate.
-    ///
-    /// # Examples
-    ///
-    /// ```
-    /// use bevy_utils::{HashMap, AHashExt};
-    /// let mut map: HashMap<&str, i32> = HashMap::with_capacity(10);
-    /// assert!(map.capacity() >= 10);
-    /// ```
-    #[inline]
-    fn with_capacity(capacity: usize) -> Self {
-        HashMap::with_capacity_and_hasher(capacity, RandomState::default())
-    }
-}
-
-/// A stable std hash map implementing `aHash`, a high speed keyed hashing algorithm
+/// A stable hash map implementing aHash, a high speed keyed hashing algorithm
 /// intended for use in in-memory hashmaps.
 ///
 /// Unlike [`HashMap`] this has an iteration order that only depends on the order
 /// of insertions and deletions and not a random source.
 ///
-/// `aHash` is designed for performance and is NOT cryptographically secure.
-pub type StableHashMap<K, V> = std::collections::HashMap<K, V, FixedState>;
+/// aHash is designed for performance and is NOT cryptographically secure.
+pub type StableHashMap<K, V> = hashbrown::HashMap<K, V, FixedState>;

-impl<K, V> AHashExt for StableHashMap<K, V> {
-    /// Creates an empty `StableHashMap` with the specified capacity with `aHash`.
-    ///
-    /// The hash map will be able to hold at least `capacity` elements without
-    /// reallocating. If `capacity` is 0, the hash map will not allocate.
-    ///
-    /// # Examples
-    ///
-    /// ```
-    /// use bevy_utils::{StableHashMap, AHashExt};
-    /// let mut map: StableHashMap<&str, i32> = StableHashMap::with_capacity(10);
-    /// assert!(map.capacity() >= 10);
-    /// ```
-    #[inline]
-    fn with_capacity(capacity: usize) -> Self {
-        StableHashMap::with_capacity_and_hasher(capacity, FixedState::default())
-    }
-}
-
-/// A [`HashSet`][std::collections::HashSet] implementing [`aHash`], a high
+/// A [`HashSet`][hashbrown::HashSet] implementing aHash, a high
 /// speed keyed hashing algorithm intended for use in in-memory hashmaps.
 ///
-/// `aHash` is designed for performance and is NOT cryptographically secure.
-///
-/// # Construction
-///
-/// Users may be surprised when a `HashSet` cannot be constructed with `HashSet::new()`:
-///
-/// ```compile_fail
-/// # fn main() {
-/// use bevy_utils::HashSet;
-///
-/// // Produces an error like "no function or associated item named `new` found [...]"
-/// let map: HashSet<String> = HashSet::new();
-/// # }
-/// ```
-///
-/// The standard library's [`HashSet::new`][std::collections::HashSet::new] is
-/// implemented only for `HashSet`s which use the
-/// [`DefaultHasher`][std::collections::hash_map::DefaultHasher], so it's not
-/// available for Bevy's `HashSet`.
-///
-/// However, an empty `HashSet` can easily be constructed using the `Default`
-/// implementation:
-///
-/// ```
-/// # fn main() {
-/// use bevy_utils::HashSet;
-///
-/// // This works!
-/// let map: HashSet<String> = HashSet::default();
-/// assert!(map.is_empty());
-/// # }
-/// ```
-///
-/// [`aHash`]: https://github.com/tkaitchuck/aHash
-pub type HashSet<K> = std::collections::HashSet<K, RandomState>;
+/// aHash is designed for performance and is NOT cryptographically secure.
+pub type HashSet<K> = hashbrown::HashSet<K, RandomState>;

-impl<K> AHashExt for HashSet<K> {
-    /// Creates an empty `HashSet` with the specified capacity with aHash.
-    ///
-    /// The hash set will be able to hold at least `capacity` elements without
-    /// reallocating. If `capacity` is 0, the hash set will not allocate.
-    ///
-    /// # Examples
-    ///
-    /// ```
-    /// use bevy_utils::{HashSet, AHashExt};
-    /// let set: HashSet<i32> = HashSet::with_capacity(10);
-    /// assert!(set.capacity() >= 10);
-    /// ```
-    #[inline]
-    fn with_capacity(capacity: usize) -> Self {
-        HashSet::with_capacity_and_hasher(capacity, RandomState::default())
-    }
-}
-
-/// A stable std hash set implementing `aHash`, a high speed keyed hashing algorithm
+/// A stable hash set implementing aHash, a high speed keyed hashing algorithm
 /// intended for use in in-memory hashmaps.
 ///
 /// Unlike [`HashSet`] this has an iteration order that only depends on the order
 /// of insertions and deletions and not a random source.
 ///
-/// `aHash` is designed for performance and is NOT cryptographically secure.
-pub type StableHashSet<K> = std::collections::HashSet<K, FixedState>;
+/// aHash is designed for performance and is NOT cryptographically secure.
+pub type StableHashSet<K> = hashbrown::HashSet<K, FixedState>;

-impl<K> AHashExt for StableHashSet<K> {
-    /// Creates an empty `StableHashSet` with the specified capacity with `aHash`.
-    ///
-    /// The hash set will be able to hold at least `capacity` elements without
-    /// reallocating. If `capacity` is 0, the hash set will not allocate.
-    ///
-    /// # Examples
-    ///
-    /// ```
-    /// use bevy_utils::{StableHashSet, AHashExt};
-    /// let set: StableHashSet<i32> = StableHashSet::with_capacity(10);
-    /// assert!(set.capacity() >= 10);
-    /// ```
+/// A pre-hashed value of a specific type. Pre-hashing enables memoization of hashes that are expensive to compute.
+/// It also enables faster [`PartialEq`] comparisons by short circuiting on hash equality.
+/// See [`PassHash`] and [`PassHasher`] for a "pass through" [`BuildHasher`] and [`Hasher`] implementation
+/// designed to work with [`Hashed`]
+/// See [`PreHashMap`] for a hashmap pre-configured to use [`Hashed`] keys.
+pub struct Hashed<V, H = FixedState> {
+    hash: u64,
+    value: V,
+    marker: PhantomData<H>,
+}
+
+impl<V: Hash, H: BuildHasher + Default> Hashed<V, H> {
+    /// Pre-hashes the given value using the [`BuildHasher`] configured in the [`Hashed`] type.
+    pub fn new(value: V) -> Self {
+        let builder = H::default();
+        let mut hasher = builder.build_hasher();
+        value.hash(&mut hasher);
+        Self {
+            hash: hasher.finish(),
+            value,
+            marker: PhantomData,
+        }
+    }
+
+    /// The pre-computed hash.
    #[inline]
-    fn with_capacity(capacity: usize) -> Self {
-        StableHashSet::with_capacity_and_hasher(capacity, FixedState::default())
+    pub fn hash(&self) -> u64 {
+        self.hash
+    }
+}
+
+impl<V, H> Hash for Hashed<V, H> {
+    #[inline]
+    fn hash<R: Hasher>(&self, state: &mut R) {
+        state.write_u64(self.hash);
+    }
+}
+
+impl<V, H> Deref for Hashed<V, H> {
+    type Target = V;
+
+    #[inline]
+    fn deref(&self) -> &Self::Target {
+        &self.value
+    }
+}
+
+impl<V: PartialEq, H> PartialEq for Hashed<V, H> {
+    /// A fast impl of [`PartialEq`] that first checks that `other`'s pre-computed hash
+    /// matches this value's pre-computed hash.
+    #[inline]
+    fn eq(&self, other: &Self) -> bool {
+        self.hash == other.hash && self.value.eq(&other.value)
+    }
+}
+
+impl<V: Debug, H> Debug for Hashed<V, H> {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        f.debug_struct("Hashed")
+            .field("hash", &self.hash)
+            .field("value", &self.value)
+            .finish()
+    }
+}
+
+impl<V: Clone, H> Clone for Hashed<V, H> {
+    #[inline]
+    fn clone(&self) -> Self {
+        Self {
+            hash: self.hash,
+            value: self.value.clone(),
+            marker: PhantomData,
+        }
+    }
+}
+
+impl<V: Eq, H> Eq for Hashed<V, H> {}
+
+/// A [`BuildHasher`] that results in a [`PassHasher`].
+#[derive(Default)]
+pub struct PassHash;
+
+impl BuildHasher for PassHash {
+    type Hasher = PassHasher;
+
+    fn build_hasher(&self) -> Self::Hasher {
+        PassHasher::default()
+    }
+}
+
+#[derive(Debug, Default)]
+pub struct PassHasher {
+    hash: u64,
+}
+
+impl Hasher for PassHasher {
+    fn write(&mut self, _bytes: &[u8]) {
+        panic!("can only hash u64 using PassHasher");
+    }
+
+    #[inline]
+    fn write_u64(&mut self, i: u64) {
+        self.hash = i;
+    }
+
+    #[inline]
+    fn finish(&self) -> u64 {
+        self.hash
+    }
+}
+
+/// A [`HashMap`] pre-configured to use [`Hashed`] keys and [`PassHash`] passthrough hashing.
+pub type PreHashMap<K, V> = hashbrown::HashMap<Hashed<K>, V, PassHash>;
+
+/// Extension methods intended to add functionality to [`PreHashMap`].
+pub trait PreHashMapExt<K, V> {
+    /// Tries to get or insert the value for the given `key` using the pre-computed hash first.
+    /// If the [`PreHashMap`] does not already contain the `key`, it will clone it and insert
+    /// the value returned by `func`.  
+    fn get_or_insert_with<F: FnOnce() -> V>(&mut self, key: &Hashed<K>, func: F) -> &mut V;
+}
+
+impl<K: Hash + Eq + PartialEq + Clone, V> PreHashMapExt<K, V> for PreHashMap<K, V> {
+    #[inline]
+    fn get_or_insert_with<F: FnOnce() -> V>(&mut self, key: &Hashed<K>, func: F) -> &mut V {
+        let entry = self
+            .raw_entry_mut()
+            .from_key_hashed_nocheck(key.hash(), key);
+        match entry {
+            RawEntryMut::Occupied(entry) => entry.into_mut(),
+            RawEntryMut::Vacant(entry) => {
+                let (_, value) = entry.insert_hashed_nocheck(key.hash(), key.clone(), func());
+                value
+            }
+        }
    }
 }