I'm generating around 1GB of data in an application which never changes afterwards. I then have hundreds of parallel tasks that need to access the data in a read-only fashion, but there's high contention for it.
Does Rust have a way of handing such a scenario efficiently? In other words, at a high-level I have something like:
#[tokio::main]
async fn main() {
let data = generate_large_vector();
for _ in 1..N {
// Do something with the data
tokio::spawn(...);
}
}
I would not like to copy the whole buffer for each task, since I will run out of memory and it's also inefficient CPU cache usage. Protecting the whole vector with a mutex also adds lots of contention.
Is there an efficient way to handle this? In C++ I would just give the data as a const to each thread.
The problem here is the lifetime of the data, so the borrow checker doesn't let me hand over a slice to it. Is there a way to spawn tasks that don't outlive the scope where they started, so that the borrow checker will allow me to pass references?
I like to use std::sync::OnceLock
when I have some global immutable data.
use std::sync::OnceLock;
static BIG_VEC: OnceLock<Vec<u8>> = OnceLock::new();
fn big_vec() -> &'static Vec<u8> {
BIG_VEC.get_or_init(generate_large_vector)
}
fn generate_large_vector() -> Vec<u8> {
todo!()
}
async fn task_a() {
let huge = big_vec();
}
async fn task_b() {
let massive = big_vec();
}