I am trying to create a random 2D array in which each entry is 0 with probability p
and 1 otherwise. In python, numpy makes it simple:
rand_arr = (np.random.rand(nrows, ncols) < p).astype(np.dtype('uint8'))
In rust, I wrote something like this
use rand::Rng;
use rand::distributions::Bernoulli;
use ndarray::{Array2};
// Create a random number generator
let mut rng = rand::thread_rng();
// Create a Bernoulli distribution with the given probability
let bernoulli = Bernoulli::new(p).unwrap();
// Generate a 2D array of random boolean values using the Bernoulli distribution
let rand_arr: Array2<bool> = Array2::from_shape_fn((nrows, ncols), |_| {
rng.sample(bernoulli)
});
// convert bools to 0 and 1
let rand_arr = rand_arr.mapv(|b| if b { 1 } else { 0 });
This is awfully slow, in fact slower than the python version! Which tells me I am doing something wrong. How can I write a faster implementation of this?
Before starting, allow me to turn your code into something testable.
use ndarray::Array2;
use rand::distributions::Bernoulli;
use rand::Rng;
use std::hint::black_box;
use std::time::{Duration, Instant};
fn main() {
let nrows = 200;
let ncols = 400;
let p = 0.2;
let bernoulli = Bernoulli::new(p).unwrap();
let (rand_arr, t) = original(bernoulli, nrows, ncols);
println!("{t:?}");
black_box(rand_arr);
}
fn original(bernoulli: Bernoulli, nrows: usize, ncols: usize) -> (Array2<u8>, Duration) {
let mut rng = rand::thread_rng();
let t = Instant::now();
let rand_arr: Array2<bool> = Array2::from_shape_fn((nrows, ncols), |_| rng.sample(bernoulli));
let rand_arr = rand_arr.mapv(|b| if b { 1 } else { 0 });
(rand_arr, t.elapsed())
}
This reports about 302 microseconds when run in release mode on my computer. That'll be the baseline. We can now plug in different functions for original
to time them.
Note that if you don't specify an integer type, Rust will use i32
. I've picked u8
to match the Python version.
First, I'll clean up your code without changing the meaning.
fn original_clean(bernoulli: Bernoulli, nrows: usize, ncols: usize) -> (Array2<u8>, Duration) {
let rng = rand::thread_rng();
let mut iter = rng.sample_iter(bernoulli);
let t = Instant::now();
let rand_arr =
Array2::from_shape_simple_fn((nrows, ncols), || if iter.next().unwrap() { 1 } else { 0 });
(rand_arr, t.elapsed())
}
This uses sample_iter
which can be more efficient than calling sample
multiple times. It also uses from_shape_simple_fn
since you're not using the index. But most importantly, it does the mapping from booleans to integers in-place.
This one takes 237 microseconds, which is substantial, but it would be useful to know how the time is split between the array creation and the random generation.
fn not_rand(_bernoulli: Bernoulli, nrows: usize, ncols: usize) -> (Array2<u8>, Duration) {
let t = Instant::now();
let rand_arr = Array2::from_shape_simple_fn((nrows, ncols), || 1);
(rand_arr, t.elapsed())
}
This generates an array of all ones. It takes 24 microseconds. So 90% of the time is spent in random generation. I went to find out what RNG NumPy uses.
default_rng
currently usesPCG64
as the defaultBitGenerator
Let's see what rand uses.
The current algorithm used is the ChaCha block cipher with 12 rounds.
I'm not super familiar with the performance of different RNGs, but ChaCha is a cryptographically secure RNG, so it's probably slower. There's a crate for PCG-64 in rust: rand_pcg. I made a function using that to plug into the timer.
fn pcg64(bernoulli: Bernoulli, nrows: usize, ncols: usize) -> (Array2<u8>, Duration) {
use rand::SeedableRng;
let rng = rand_pcg::Pcg64::from_entropy();
let mut iter = rng.sample_iter(bernoulli);
let t = Instant::now();
let rand_arr =
Array2::from_shape_simple_fn((nrows, ncols), || if iter.next().unwrap() { 1 } else { 0 });
(rand_arr, t.elapsed())
}
This runs in 132 microseconds! You may even be able to use a cheaper RNG (in both Rust and Python) if your use case allows.
If this is still slower than NumPy, I wouldn't be that surprised. NumPy has its performance-intensive code in C, so it isn't burdened by Python's usual slowness. However, the Rust version should beat Python if the logic outside of NumPy ever takes a substantial portion of time.
Here's all of that on the playground, except for the PCG because the crate isn't available.
I went ahead and tested the Python just to check. It's not timing exactly the same thing, but it should give an idea of how the Rust version is doing.
Edit: I didn't notice that random.rand
uses the old MT19937 RNG, so I've updated this to use PCG through default_rng
. MT19937 is about 60% slower. I also tried Rust's mt19937 but it was quite slow.
import numpy as np
import time
nrows = 200
ncols = 400
p = 0.2
a = time.perf_counter()
rand_arr = (np.random.default_rng().random((nrows, ncols), dtype='float32') < p).astype(np.dtype('uint8'))
b = time.perf_counter()
print((b - a) * 1000 * 1000, "microseconds")
This takes 370 microseconds.