Search code examples
rustvector

How to efficiently check a Vec<u8> to see if it's all zeros?


I have many 4KiB buffers, which have a 50% chance of containing only zero values. The non-zero buffers will typically have a non-zero byte early in the buffer.

fn is_zero(buf: &Vec<u8>) -> bool {
    for byte in buf.into_iter() {
        if *byte != 0 {
            return false;
        }
    }
    return true;
}

Is this a performant way of checking in Rust, with --release? (I am processing many GBs of data.)

(In the C version, I cast the buffer to unsigned long long before checking. That probably wasn't the best I could do, given SSE etc.)


Solution

  • You an use align_to to convert the slice of u8 into a slice of u128, making the comparison more efficient:

    fn is_zero(buf: &[u8]) -> bool {
        let (prefix, aligned, suffix) = unsafe { buf.align_to::<u128>() };
    
        prefix.iter().all(|&x| x == 0)
            && suffix.iter().all(|&x| x == 0)
            && aligned.iter().all(|&x| x == 0)
    }
    

    Running a simple benchmark on my machine shows 16x performance gains!

    #![feature(test)]
    extern crate test;
    
    fn v() -> Vec<u8> {
        std::iter::repeat(0).take(1000000).collect()
    }
    
    fn is_zero(buf: &[u8]) -> bool {
        buf.into_iter().all(|&b| b == 0)
    }
    
    fn is_zero_aligned(buf: &[u8]) -> bool {
        let (prefix, aligned, suffix) = unsafe { buf.align_to::<u128>() };
    
        prefix.iter().all(|&x| x == 0)
            && suffix.iter().all(|&x| x == 0)
            && aligned.iter().all(|&x| x == 0)
    }
    
    #[bench]
    fn bench_is_zero(b: &mut test::Bencher) {
        let v = test::black_box(v());
        b.iter(|| is_zero(&v[..]))
    }
    
    #[bench]
    fn bench_is_zero_aligned(b: &mut test::Bencher) {
        let v = test::black_box(v());
        b.iter(|| is_zero_aligned(&v[..]))
    }
    
    running 2 tests
    test tests::bench_is_zero         ... bench:     455,975 ns/iter (+/- 414)
    test tests::bench_is_zero_aligned ... bench:      28,615 ns/iter (+/- 116)
    

    Depending on your machine, different integer types (u64) may yield better performance.

    Thanks to @Globi on the Rust discord server for the idea