How can I generate a Bernoulli random integer array in rust?

I am trying to create a random 2D array in which each entry is 0 with probability p and 1 otherwise. In python, numpy makes it simple:

rand_arr = (np.random.rand(nrows, ncols) < p).astype(np.dtype('uint8'))

In rust, I wrote something like this

use rand::Rng;
use rand::distributions::Bernoulli;
use ndarray::{Array2};

// Create a random number generator
let mut rng = rand::thread_rng();

// Create a Bernoulli distribution with the given probability
let bernoulli = Bernoulli::new(p).unwrap();

// Generate a 2D array of random boolean values using the Bernoulli distribution
let rand_arr: Array2<bool> = Array2::from_shape_fn((nrows, ncols), |_| {
    rng.sample(bernoulli)
});

// convert bools to 0 and 1
let rand_arr = rand_arr.mapv(|b| if b { 1 } else { 0 });

This is awfully slow, in fact slower than the python version! Which tells me I am doing something wrong. How can I write a faster implementation of this?

Solution

Before starting, allow me to turn your code into something testable.

use ndarray::Array2;
use rand::distributions::Bernoulli;
use rand::Rng;
use std::hint::black_box;
use std::time::{Duration, Instant};

fn main() {
    let nrows = 200;
    let ncols = 400;
    let p = 0.2;
    let bernoulli = Bernoulli::new(p).unwrap();

    let (rand_arr, t) = original(bernoulli, nrows, ncols);

    println!("{t:?}");
    black_box(rand_arr);
}

fn original(bernoulli: Bernoulli, nrows: usize, ncols: usize) -> (Array2<u8>, Duration) {
    let mut rng = rand::thread_rng();

    let t = Instant::now();
    let rand_arr: Array2<bool> = Array2::from_shape_fn((nrows, ncols), |_| rng.sample(bernoulli));
    let rand_arr = rand_arr.mapv(|b| if b { 1 } else { 0 });

    (rand_arr, t.elapsed())
}

This reports about 302 microseconds when run in release mode on my computer. That'll be the baseline. We can now plug in different functions for original to time them.

Note that if you don't specify an integer type, Rust will use i32. I've picked u8 to match the Python version.

First, I'll clean up your code without changing the meaning.

fn original_clean(bernoulli: Bernoulli, nrows: usize, ncols: usize) -> (Array2<u8>, Duration) {
    let rng = rand::thread_rng();
    let mut iter = rng.sample_iter(bernoulli);

    let t = Instant::now();
    let rand_arr =
        Array2::from_shape_simple_fn((nrows, ncols), || if iter.next().unwrap() { 1 } else { 0 });

    (rand_arr, t.elapsed())
}

This uses sample_iter which can be more efficient than calling sample multiple times. It also uses from_shape_simple_fn since you're not using the index. But most importantly, it does the mapping from booleans to integers in-place.

This one takes 237 microseconds, which is substantial, but it would be useful to know how the time is split between the array creation and the random generation.

fn not_rand(_bernoulli: Bernoulli, nrows: usize, ncols: usize) -> (Array2<u8>, Duration) {
    let t = Instant::now();
    let rand_arr = Array2::from_shape_simple_fn((nrows, ncols), || 1);

    (rand_arr, t.elapsed())
}

This generates an array of all ones. It takes 24 microseconds. So 90% of the time is spent in random generation. I went to find out what RNG NumPy uses.

default_rng currently uses PCG64 as the default BitGenerator

Let's see what rand uses.

The current algorithm used is the ChaCha block cipher with 12 rounds.

I'm not super familiar with the performance of different RNGs, but ChaCha is a cryptographically secure RNG, so it's probably slower. There's a crate for PCG-64 in rust: rand_pcg. I made a function using that to plug into the timer.

fn pcg64(bernoulli: Bernoulli, nrows: usize, ncols: usize) -> (Array2<u8>, Duration) {
    use rand::SeedableRng;
    let rng = rand_pcg::Pcg64::from_entropy();
    let mut iter = rng.sample_iter(bernoulli);

    let t = Instant::now();
    let rand_arr =
        Array2::from_shape_simple_fn((nrows, ncols), || if iter.next().unwrap() { 1 } else { 0 });

    (rand_arr, t.elapsed())
}

This runs in 132 microseconds! You may even be able to use a cheaper RNG (in both Rust and Python) if your use case allows.

If this is still slower than NumPy, I wouldn't be that surprised. NumPy has its performance-intensive code in C, so it isn't burdened by Python's usual slowness. However, the Rust version should beat Python if the logic outside of NumPy ever takes a substantial portion of time.

Here's all of that on the playground, except for the PCG because the crate isn't available.

I went ahead and tested the Python just to check. It's not timing exactly the same thing, but it should give an idea of how the Rust version is doing.

Edit: I didn't notice that random.rand uses the old MT19937 RNG, so I've updated this to use PCG through default_rng. MT19937 is about 60% slower. I also tried Rust's mt19937 but it was quite slow.

import numpy as np
import time

nrows = 200
ncols = 400
p = 0.2

a = time.perf_counter()
rand_arr = (np.random.default_rng().random((nrows, ncols), dtype='float32') < p).astype(np.dtype('uint8'))
b = time.perf_counter()

print((b - a) * 1000 * 1000, "microseconds")

This takes 370 microseconds.