Search code examples
rustrust-tokio

How to efficiently pass a large chunk of data to Tokio tasks?


I'm generating around 1GB of data in an application which never changes afterwards. I then have hundreds of parallel tasks that need to access the data in a read-only fashion, but there's high contention for it.

Does Rust have a way of handing such a scenario efficiently? In other words, at a high-level I have something like:

#[tokio::main]
async fn main() {
    let data = generate_large_vector();

    for _ in 1..N {
        // Do something with the data
        tokio::spawn(...);
    }
}

I would not like to copy the whole buffer for each task, since I will run out of memory and it's also inefficient CPU cache usage. Protecting the whole vector with a mutex also adds lots of contention.

Is there an efficient way to handle this? In C++ I would just give the data as a const to each thread.

The problem here is the lifetime of the data, so the borrow checker doesn't let me hand over a slice to it. Is there a way to spawn tasks that don't outlive the scope where they started, so that the borrow checker will allow me to pass references?


Solution

  • I like to use std::sync::OnceLock when I have some global immutable data.

    use std::sync::OnceLock;
    
    static BIG_VEC: OnceLock<Vec<u8>> = OnceLock::new();
    
    fn big_vec() -> &'static Vec<u8> {
        BIG_VEC.get_or_init(generate_large_vector)
    }
    
    fn generate_large_vector() -> Vec<u8> {
        todo!()
    }
    
    async fn task_a() {
        let huge = big_vec();
    }
    
    async fn task_b() {
        let massive = big_vec();
    }