Search code examples
multithreadingperformancerusthashmaplock-free

Performance degradation for high numbers of threads in Rust


I'm creating a latch-free concurrent HashMap in Rust. The throughput curve looks as I would expect up to around 16 threads, at which point performance beings to drop.

Throughput (MOps/sec) vs. Num threads

Throughput (MOps/sec) vs. Num threads

I used a Google Cloud instance with 48 vCPUs and 200GB RAM. I tried enabling/disabling hyperthreading but it had no noticeable result.

Here's how I'm spawning the threads:

for i in 0..num_threads {
    //clone the shared data structure
    let ht = Arc::clone(&ht);

    let handle = thread::spawn(move || {
        for j in 0..adds_per_thread {
            //randomly generate and add a (key, value)
            let key = thread_rng().gen::<u32>();
            let value = thread_rng().gen::<u32>();
            ht.set_item(key, value);
        }
    });

    handles.push(handle);
}

for handle in handles {
    handle.join().unwrap();
}

I'm out of ideas; is my Rust code correct for multithreading?


Solution

  • If all your threads spend all their time hammering on your lock-free data structure, yes you'll get contention once you have enough threads. With enough writers they'll contend for the same cache line in the table more often. (Also, possibly time spent in the PRNG doesn't hide contention for shared bandwidth to cache or DRAM).

    Instead of just a plateau, you'll maybe start hitting more CAS retries and stuff like that, including any contention backoff mechanism. Also, threads will suffer cache misses and even memory-order mis-speculation pipeline clears from some atomic reads; not everything will be atomic RMWs or writes.

    This is not the normal use-case for lock-free data structures; usually you use them with code that does significant work other than hammering on them, so actual contention is low. Also, real workloads for a hashmap are rarely write-only (although that can happen if you just want to deduplicate something).

    Read scales very well with number of readers, but write will hit contention.