Why is running cargo bench faster than running release build?

I want to benchmark my Rust programs, and was comparing some alternatives to do that. I noted, however, that when running a benchmark with cargo bench and the bencher crate, the code runs consistently faster than when running a production build (cargo build --release) with the same code. For example:

Main code:

use dot_product;
const N: usize = 1000000;

use std::time;
fn main() {
    let start = time::Instant::now();
    dot_product::rayon_parallel([1; N].to_vec(), [2; N].to_vec());
    println!("Time: {:?}", start.elapsed());
}

Average time: ~20ms

Benchmark code:

#[macro_use]
extern crate bencher;

use dot_product;

use bencher::Bencher;

const N: usize = 1000000;

fn parallel(bench: &mut Bencher) {
    bench.iter(|| dot_product::rayon_parallel([1; N].to_vec(), [2; N].to_vec()))
}

benchmark_group!(benches, sequential, parallel);
benchmark_main!(benches);

Time: 5,006,199 ns/iter (+/- 1,320,975)

I tried the same with some other programs and cargo bench gives consistently faster results. Why could this happen?

Solution

As the comments suggested, you should use criterion::black_box() on all (final) results in the benchmarking code. This function does nothing - and simply gives back its only parameter - but is opaque to the optimizer, so the compiler has to assume the function does something with the input.

When not using black_box(), the benchmarking code doesn't actually do anything, as the compiler is able to figure out that the results of your code are unused and no side-effects can be observed. So it removes all your code during dead-code elimination and what you end up benchmarking is the benchmarking-suite itself.