Search code examples
performancerustiterator

How to improve `extend` performance for custom Iterator


My app contains a custom RingBuffer implementation because I need a few functions that are not provided by VecDeque<u8>.

A performance critical path for me is to extend a Vec<u8> with a subrange of this ringbuffer. So I benchmarked and found that the default VecDeque range iter performs significantly better than mine, despite having literally the same datastructure.

I checked in godbolt and as expected, the built-in one gets cool auto vectorization while my one doesn't.

While i will probably just end up writing manual extend_from_slice / std::ptr::copy's for my usecase, I was interested in finding out whether there was anything I could have done do to get comparable performance to the built-in Iterator. So far I tried using &u8 for the iter but that was worse.

Is there some additional trait I could implement for my Iter or something?

My code looks something like this:

use std::collections::VecDeque;

pub struct RingbufIter<'a> {
    s1: std::slice::Iter<'a, u8>,
    s2: std::slice::Iter<'a, u8>,
}
impl<'a> Iterator for RingbufIter<'a> {
    type Item = u8;
    fn next(&mut self) -> Option<Self::Item> {
        match self.s1.next().copied() {
            Some(b) => Some(b),
            None => {
                // swap to reduce branches on subsequent `next` calls
                std::mem::swap(&mut self.s1, &mut self.s2);
                self.s1.next().copied()
            }
        }
    }
    fn size_hint(&self) -> (usize, Option<usize>) {
        let (s1_min, s1_max) = self.s1.size_hint();
        let (s2_min, s2_max) = self.s2.size_hint();
        (s1_min + s2_min, Some(s1_max.unwrap() + s2_max.unwrap()))
    }
}

#[no_mangle]
pub fn extend_from_custom_iter(src: RingbufIter, sink: &mut Vec<u8>)  {
    sink.extend(src)
}

#[no_mangle]
pub fn extend_from_vec_deque(src: &mut VecDeque<u8>, sink: &mut Vec<u8>)  {
    sink.extend(src.range(0..src.len()))
}


Solution

  • Standard library uses a specialization to provide faster implementations of Extend for some collections it knows about. However you can't use it as it is unstable. If you want to manually implement Extend you can do it for a new type. Otherwise just write normal method RingbufIter::extend_into_vec(&mut self, &mut Vec<u8>). For example:

    #[no_mangle]
    pub fn extend_from_custom_iter(src: RingbufIter, sink: &mut Vec<u8>)  {
        // Write optimized version that copies slices here
        // instead of relying on `Vec as Extend`.
    }
    

    This unfortunately will prevent you from writing generic code, but if your use case is to only use your custom ring buffer and Vec, this should be fine.

    EDIT. Actually using new type and implementing Extend for it won't be possible, since Extend::extend is generic over iterator from which it extends. So to use specialized algorithm for your RingbufIter you would have to use another trait SpecializedExtend and implement it for RingbufIter and provide a blanked implementation for all other iterators, which is what std does internally and what is unstable. So you have to use concrete implementation method.