Search code examples
rustunsafe

Iterator Optimization done by the rust compiler


I'm trying to understand how rust does optimization with iterators. Consider the following code

let v: Vec<T> = vec![...]
let v2: Vec<P> = v.into_iter().map(|x| x.transform()).collect();

where transform() consumes self and return another structure P. That is the function signature looks like this

impl T {
  fn transform(self) -> P {...}
}

If T and P have the same size, would the compiler be able to optimize the code to not allocate another vector in memory and do the map in place? While it is doing the map, it would have half of the vector being on type and another being another, but the structure of the code does not let the user access the vector in the broken state in between.

If this optimization is not done, is there a way to do it with unsafe code? I would iter_mut with transmute work? e.g.

    v.iter_mut().for_each(|x| {
        let p: *mut T = x;
        unsafe {
            let x = ptr::read(p);
            let y = x.transform();
            ptr::write(p, transmute::<P, T>(y));
        }
    });
    let v = unsafe {transmute::<Vec<T>, Vec<P>>(v)};

Lastly, if you could tell me the method you used to investigate the optimization that would be great. I'm new to low level programming so any pointer in the general direction would be greatly appreciated.


Solution

  • Yes, it would.

    Okay, technically the compiler won't. But the standard library has specialization for that.

    The docs even say that Vec might do that.

    Vec may use any or none of the following strategies, depending on the supplied iterator:

    ...

    • perform the iteration in-place on the original allocation backing the iterator

    In practice, currently it does.

    Here's an example:

    pub fn foo(data: Vec<u32>) -> Vec<i32> {
        data.into_iter().map(|v| v as i32).collect()
    }
    

    This compiles to the following assembly:

    example::foo::h155d07d1e93916ee:
            mov     rax, rdi
            movabs  rcx, 4611686018427387903
            and     rcx, qword ptr [rsi + 16]
            movups  xmm0, xmmword ptr [rsi]
            movups  xmmword ptr [rdi], xmm0
            mov     qword ptr [rdi + 16], rcx
            ret
    

    That is, only move the vector. Not even a loop.

    In-place iteration only works if:

    • The alignment of the source and destination type is the same.
    • The iterator adapters used are from std (since only they implement the required traits).
    • The original collection supports in-place iteration (only Vec and BinaryHeap).
    • If you have flatten(), flat_map() or array_chunks() in the pipeline, there is some complicated calculation going on.

    The implementation is here.

    If you need a guarantee, don't transmute the Vec, it is unsound. Instead, use Vec::from_raw_parts().