Search code examples
data-structuresrusthashmapheap-memory

How does iterating over HashMaps work in memory: Rust


I know how to iterate over a HashMap in Rust, however, I am a little confused about how this works in memory. How do we iterate over values that are not stored sequentially in memory? A detailed explanation of the code below at the heap and stack level would be much appreciated.

use std::collections::HashMap;

let name = vec![String::from("Charlie"), String::from("Winston"), String::from("Brian"), String::from("Jack")];
let age = vec![50, 5, 7, 21];

let mut people_ages: HashMap<String, i32> = name.into_iter().zip(age.into_iter()).collect();


for (key, value) in &people_ages {
    println!("{}: {}", key, value);
}

Solution

  • At the end of the intro of the documentation, it is mentioned that the implementation relies on a C++ implementation of SwissTables. This page contains illustrations about two variants: « flat » and « node » based.

    The main difference between these two variants is pointer stability. In the « node » based version, the key-value pairs, once inserted, keep their address in memory even if the hash is reorganised. In the « flat » version, some insertions/removals can make the previous key-value pairs be moved in memory.

    When it comes to the Rust implementation, I am not experienced enough to be certain of any specific detail, but I tried this simple example based on yours.

    use std::collections::HashMap;
    
    fn main() {
        let name = vec![
            String::from("Charlie"),
            String::from("Winston"),
            String::from("Brian"),
            String::from("Jack"),
        ];
        let age = vec![50, 5, 7, 21];
        let mut people_ages: HashMap<String, i32> =
            name.into_iter().zip(age.into_iter()).collect();
        let mut keys = Vec::new();
        let mut values = Vec::new();
        for (key, value) in &people_ages {
            keys.push(key);
            values.push(value);
            let key_addr = key as *const String as usize;
            let value_addr = value as *const i32 as usize;
            println!("{:x} {:x} {}: {}", key_addr, value_addr, key, value);
        }
        // people_ages.insert("Bob".to_owned(), 4); // mutable and immutable borrow
        println!("keys: {:?}", keys);
        println!("values: {:?}", values);
    }
    /*
    55e08ff8bd40 55e08ff8bd58 Brian: 7
    55e08ff8bd20 55e08ff8bd38 Charlie: 50
    55e08ff8bd00 55e08ff8bd18 Winston: 5
    55e08ff8bce0 55e08ff8bcf8 Jack: 21
    keys: ["Brian", "Charlie", "Winston", "Jack"]
    values: [7, 50, 5, 21]
    */
    

    The commented out line (insertion) is rejected because we cannot alter the hashmap while keeping references to its content. Thus, I guess (I'm not certain) that the implementation does not rely on the « node » based variant since we cannot take benefit of the pointer stability it provides (due to the ownership model in Rust), and probably it relies on the « flat » variant.

    This means that we can expect that the key-value pairs associated with the same hash are tightly packed in memory, and iterating over them should be very similar to iterating over a vector: regular progression (with some skips however) very friendly with cache prefetch. Printing the addresses tends to confirm that guess (however the test is not complete enough), and shows a backward progression.