Search code examples
rustclosureslifetimeunsafe

Consume Vector inside closure without cloning


I have this data structure.

let bucket = HashMap<&str, Vec<&str>>

Given

let cluster = Vec<&str>

I want to expand it from the Vecs on Bucket and I can guarantee that I will just access each key value pair once and the &str in cluster are always a key in bucket.

use std::collections::HashMap;

fn main() {
    let mut bucket: HashMap<&str, Vec<&str>> = HashMap::new();
    bucket.insert("a", vec!["hello", "good morning"]);
    bucket.insert("b", vec!["bye", "ciao"]);
    bucket.insert("c", vec!["good"]);
    let cluster = vec!["a", "b"];
    let cluster2 = vec!["c"];
    let mut clusters = [cluster, cluster2];
    clusters.iter_mut().for_each(|cluster| {
        // I don't like this clone
        let tmp = cluster.clone();
        let tmp = tmp.iter().flat_map(|seq| bucket[seq].
            clone() // I really don't like this other clone
        );
        cluster.extend(tmp);
    });
    println!("{:?}", clusters);
}

This compiles but what I really want to do is drain the vector on bucket since I know I won't access it again.

let tmp = tmp.iter().flat_map(|seq| bucket.get_mut(seq).
    unwrap().drain(..)
);

That gives me a compiler error:

error: captured variable cannot escape `FnMut` closure body
  --> src/main.rs:13:45
   |
4  |       let mut bucket: HashMap<&str, Vec<&str>> = HashMap::new();
   |           ---------- variable defined here
...
13 |           let tmp = tmp.iter().flat_map(|seq| bucket.get_mut(seq).
   |                                             - ^-----
   |                                             | |
   |  ___________________________________________|_variable captured here
   | |                                           |
   | |                                           inferred to be a `FnMut` closure
14 | |             unwrap().drain(..)
   | |______________________________^ returns a reference to a captured variable which escapes the closure body
   |
   = note: `FnMut` closures only have access to their captured variables while they are executing...
   = note: ...therefore, they cannot allow references to captured variables to escape

Do I need to go unsafe? How? And more importantly, is it reasonable to want to remove that clone?


Solution

  • You can eliminate bucket[seq].clone() using std::mem::take():

    let tmp = tmp.iter().flat_map(
        |seq| std::mem::take(bucket.get_mut(seq).unwrap()),
    );
    

    That will transfer ownership of the existing Vec and leave an empty one in the hash map. Since the map remains in a well-defined state, this is 100% safe. Since the empty vector doesn't allocate, it is also efficient. And finally, since you can guarantee that you no longer access that key, it is correct. (Playground.)

    As pointed out in the comments, an alternative is to remove the vector from the hash map, which also transfer the ownership of the vector:

    let tmp = tmp.iter().flat_map(|seq| bucket.remove(seq).unwrap());
    

    The outer cluster.clone() cannot be replaced with take() because you need the old contents. The issue here is that you cannot extend the vector you are iterating over, which Rust doesn't allow in order to implement efficient pointer-based iteration. A simple and effective solution here would be to use indices instead of iteration (playground):

    clusters.iter_mut().for_each(|cluster| {
        let initial_len = cluster.len();
        for ind in 0..initial_len {
            let seq = cluster[ind];
            cluster.extend(std::mem::take(bucket.get_mut(seq).unwrap()));
        }
    });
    

    Of course, with indexing you pay the price of indirection and bound checks, but rustc/llvm is pretty good at removing both when it is safe to do so, and even if it doesn't, indexed access might still be more efficient than cloning. The only way to be sure whether this improves on your original code is to benchmark both versions on production data.