I want to benchmark my Rust programs, and was comparing some alternatives to do that. I noted, however, that when running a benchmark with cargo bench
and the bencher
crate, the code runs consistently faster than when running a production build (cargo build --release
) with the same code. For example:
Main code:
use dot_product;
const N: usize = 1000000;
use std::time;
fn main() {
let start = time::Instant::now();
dot_product::rayon_parallel([1; N].to_vec(), [2; N].to_vec());
println!("Time: {:?}", start.elapsed());
}
Average time: ~20ms
Benchmark code:
#[macro_use]
extern crate bencher;
use dot_product;
use bencher::Bencher;
const N: usize = 1000000;
fn parallel(bench: &mut Bencher) {
bench.iter(|| dot_product::rayon_parallel([1; N].to_vec(), [2; N].to_vec()))
}
benchmark_group!(benches, sequential, parallel);
benchmark_main!(benches);
Time: 5,006,199 ns/iter (+/- 1,320,975)
I tried the same with some other programs and cargo bench
gives consistently faster results. Why could this happen?
As the comments suggested, you should use criterion::black_box()
on all (final) results in the benchmarking code. This function does nothing - and simply gives back its only parameter - but is opaque to the optimizer, so the compiler has to assume the function does something with the input.
When not using black_box()
, the benchmarking code doesn't actually do anything, as the compiler is able to figure out that the results of your code are unused and no side-effects can be observed. So it removes all your code during dead-code elimination and what you end up benchmarking is the benchmarking-suite itself.