Search code examples
multithreadingrustmutex

What the correct way working with threads with arc mutex?


I have some urls, then i divide it with chunks and iterate them concurrently, saving results in vec using cloning:

pub fn collect_parm_list(url_list: Vec<String>) -> Result<Vec<String>, Box<dyn std::error::Error>> {
let shared_id_list = Arc::new(Mutex::new(vec![]));
for chunk in url_list.chunks(20) {
    let mut handles = vec![];
    for url in chunk {
        let id_list = Arc::clone(&shared_id_list);
        let handle = thread::spawn(move || {
            // total: optionSale -> result -> total
            let mut id_list_v = id_list.lock().unwrap();
            id_list_v.push(url.to_string())
        });

        handles.push(handle);
    }
    // Wait for all threads to complete
    for handle in handles {
        handle.join().unwrap();
    }
}

let x = shared_id_list.lock().unwrap();
Ok(x.to_vec())
}

I have error with borrowing data:

url_list does not live long enough borrowed value does not live long enough

at: for chunk in url_list.chunks(20) {


Solution

  • Your problem is that .chunks() does not return an iterator of owned things, it returns a slice. What that means in this case is that while you have a Vec<String>, .chunks() gives you an iterator of &[String], which means that url is a &String. You then attempt to give this to a closure you pass to thread::spawn, but closures given to that function are required to be 'static meaning they do not borrow from any locals. This is the source of the error: the closure you pass to thread::spawn indirectly borrows from url_list.

    (Aside: The compiler admittedly doesn't do a good job of directing you to the thread::spawn invocation as the source of the problem. argument requires that "url_list" is borrowed for "'static" indicates that it does know the problem is the argument to thread::spawn but I don't know why it doesn't explicitly highlight that invocation. This seems like a bug in the compiler diagnostics to me.)

    Note that this is problematic even though you join all of the thread handles. The borrow checker doesn't know that this happens nor what it means; all it knows is that the closure given to thread::spawn must be 'static and it's not. (And even if it could know that, the thread running collect_parm_list could panic when unwrapping the result of .join(), which would cause it to drop url_list and then the still-running threads have a use-after-free bug!)

    A simple fix would be to add let url = url.to_owned(); before you spawn the thread, which will create an independent copy of url and then the closure will capture that instead.

    Alternatively, if you had some way to inform the borrow checker that the threads cannot outlive url_list even if the current thread panics then it would be safe. This is exactly what scoped threads accomplish! Using scoped threads you can avoid making a copy of url, but it also lets you eliminate the Arc layer from your output list. (You still need the Mutex to synchronize access.) This also allows you to use .into_inner() on the Mutex when the threads are done since you know at that point no threads still have access, so you can consume it without waiting for a lock.

    The code for this approach winds up being much simpler, too:

    pub fn collect_parm_list(url_list: Vec<String>) -> Vec<String> {
        let id_list = Mutex::new(vec![]);
        thread::scope(|s| {
            for chunk in url_list.chunks(20) {
                for url in chunk {
                    s.spawn(|| {
                        // total: optionSale -> result -> total
                        let mut id_list_v = id_list.lock().unwrap();
                        id_list_v.push(url.to_string())
                    });
                }
            }
        });
    
        id_list.into_inner().unwrap()
    }