Search code examples
rustparallel-processingfoldrayon

How can I create a HashMap using Rayon's parallel fold?


I am trying to create a HashMap using functional programming and utilizing parallelization by rayon.

If I try this without rayon, it works:

use std::collections::HashMap;

fn main() {
    let nums = [1, 2, 1, 2, 1, 2];
    let result: HashMap<i32, i32> =
        nums.iter()
            .filter(|x| *x % 2 == 0)
            .fold(HashMap::new(), |mut acc, x| {
                *acc.entry(*x).or_insert(0) += 1;
                acc
            });

    println!("{:?}", result);
}

If I try to use multiple cores by switching from iter() to par_iter(), I get an error:

use rayon::prelude::*; // 1.5.1
use std::collections::HashMap;

fn main() {
    let nums = [1, 2, 1, 2, 1, 2];
    let result: HashMap<i32, i32> =
        nums.par_iter()
            .filter(|x| *x % 2 == 0)
            .fold(HashMap::new(), |mut acc, x| {
                *acc.entry(*x).or_insert(0) += 1;
                acc
            });

    println!("{:?}", result);
}
error[E0277]: expected a `Fn<()>` closure, found `HashMap<_, _>`
 --> src/main.rs:9:19
  |
9 |             .fold(HashMap::new(), |mut acc, x| {
  |                   ^^^^^^^^^^^^^^ expected an `Fn<()>` closure, found `HashMap<_, _>`
  |
  = help: the trait `Fn<()>` is not implemented for `HashMap<_, _>`
  = note: wrap the `HashMap<_, _>` in a closure with no arguments: `|| { /* code */ }`

error[E0308]: mismatched types
  --> src/main.rs:7:9
   |
6  |       let result: HashMap<i32, i32> =
   |                   ----------------- expected due to this
7  | /         nums.par_iter()
8  | |             .filter(|x| *x % 2 == 0)
9  | |             .fold(HashMap::new(), |mut acc, x| {
10 | |                 *acc.entry(*x).or_insert(0) += 1;
11 | |                 acc
12 | |             });
   | |______________^ expected struct `HashMap`, found struct `Fold`
   |
   = note: expected struct `HashMap<i32, i32>`
              found struct `Fold<rayon::iter::Filter<rayon::slice::Iter<'_, {integer}>, [closure@src/main.rs:8:21: 8:36]>, HashMap<_, _>, _>`

Obviously, Rust tries to stop me from doing something stupid involving race conditions, but how would I build a HashMap inside a par_iter()?


Solution

  • Rayon's fold creates intermediate items (cannot be known how many). From the documentation (emphasis mine):

    Parallel fold is similar to sequential fold except that the sequence of items may be subdivided before it is folded. Consider a list of numbers like 22 3 77 89 46. If you used sequential fold to add them (fold(0, |a,b| a+b), you would wind up first adding 0 + 22, then 22 + 3, then 25 + 77, and so forth. The parallel fold works similarly except that it first breaks up your list into sublists, and hence instead of yielding up a single sum at the end, it yields up multiple sums. The number of results is nondeterministic, as is the point where the breaks occur.

    You need to reduce those intermediate items into the final one:

    use rayon::prelude::*; // 1.5.1
    use std::collections::HashMap;
    
    fn main() {
        let nums = [1, 2, 1, 2, 1, 2];
        let result: HashMap<i32, i32> = nums
            .par_iter()
            .filter(|x| *x % 2 == 0)
            .fold(HashMap::new, |mut acc, x| {
                *acc.entry(*x).or_insert(0) += 1;
                acc
            })
            .reduce_with(|mut m1, m2| {
                for (k, v) in m2 {
                    *m1.entry(k).or_default() += v;
                }
                m1
            })
            .unwrap();
    
        println!("{:?}", result);
    }
    

    Playground

    Also note that the first parameter of Rayon's fold is a function that creates the empty HashMap, not an empty HashMap like the standard library's fold.