Search code examples
rustlifetimeborrow-checker

How to please the borrow checker when trying to reuse a Vec with a lifetime


A common pattern that happens to me in Rust goes something like this:

struct Foo { /*...*/ }
struct FooProcessor {
    foos: Vec<&'??? mut Foo>, // this lifetime is the issue, see explanation below
    // other data structures needed for processing Foos ....
}
impl FooProcessor {
    // just to make it clear, `self` may outlive `input` and it's `Foo`s
    pub fn process(&mut self, input: &mut [Foo]) {
        // some example operation that requires random access to a subset
        self.foos.extend(input.iter().filter(|f| f.some_property()));
        self.foos.sort(); 

        // work some more on the Foos...

        // remove all references to the processed Foos
        self.foos.clear();
    }
}

The problem is the lifetime of FooProcessor::foos, as highlighted by the questionmarks above. All references to Foos are cleared at the end of process(), but of course the borrow checker doesn't know that.

FooProcessor is usually a large, long lived object (specifically, it may outlive some Foos), and I don't want to reallocate the Vec<&Foo> every time process() is called (not even by using some SmallVec thing).

Is there a good way to solve this that does not require the use of unsafe code (lifetime transmutation / pointers) each time I have this pattern? (If the solution involves using some crate, it's of course fine if that crate uses unsafe internally).


Solution

  • Transmuting Vec<T> to Vec<U> is in general unsound. However, we can use Vec::into_raw_parts to decompose the vector into a data pointer, length, and capacity, transmute the data pointer, and then use Vec::from_raw_parts to recreate the vector with the new type, as long as we can satisfy the invariants of Vec::from_raw_parts.

    Well, we could... but Vec::into_raw_parts is unstable. However, it is implemented in terms of public Vec methods, so it's trivial to copy and paste the implementation:

    fn vec_into_raw_parts<T>(v: Vec<T>) -> (*mut T, usize, usize) {
        let mut v = std::mem::ManuallyDrop::new(v);
        (v.as_mut_ptr(), v.len(), v.capacity())
    }
    

    Ok, now let's walk down the list of Vec::from_raw_parts' invariants:


    ptr must have been allocated using the global allocator, such as via the alloc::alloc function.

    We are getting the pointer from a Vec that uses the global allocator.

    T needs to have the same alignment as what ptr was allocated with. (T having a less strict alignment is not sufficient, the alignment really needs to be equal to satisfy the dealloc requirement that memory must be allocated and deallocated with the same layout.)

    Vec will ensure the allocation is aligned for T, so we can just have our function assert that T and U have the same alignment. As the types are known at compile time, the compiler will elide this check if it would pass.

    The size of T times the capacity (ie. the allocated size in bytes) needs to be the same size as the pointer was allocated with. (Because similar to alignment, dealloc must be called with the same layout size.)

    We will also assert that the sizes of T and U are equal, which satisfies this invariant.

    length needs to be less than or equal to capacity.

    capacity needs to be the capacity that the pointer was allocated with.

    The allocated size in bytes must be no larger than isize::MAX. See the safety documentation of pointer::offset.

    We will get the length, capacity, and pointer from an existing vector, which must satisfy these invariants unless we've already hit some UB somewhere.

    The first length values must be properly initialized values of type T.

    We will clear the vector before decomposing it, so length will be zero, satisfying this invariant vacuously.


    Now we can define a safe transmutation between Vec element types in these terms:

    fn safe_vec_transmute<T, U>(mut v: Vec<T>) -> Vec<U> {
        assert_eq!(std::mem::size_of::<T>(), std::mem::size_of::<U>());
        assert_eq!(std::mem::align_of::<T>(), std::mem::align_of::<U>());
    
        v.clear();
        
        let (ptr, len, cap) = vec_into_raw_parts(v);
    
        // SAFETY:
        //
        // We assert T and U have the same size and alignment, and we clear the
        // vector first.  This satisfies several invariants of Vec:from_raw_parts.
        // The remaining invariants are satisfied because we get ptr, len, and cap
        // from an existing vector.
        unsafe { Vec::from_raw_parts(ptr as *mut U, len, cap) }
    }
    

    To make use of this, you would store foos as Vec<&'static mut T>. In process() you could std::mem::take(&mut self.foos) to steal the allocation, leaving behind a zero-capacity vector, transmute the lifetime using safe_vec_transmute, do your work with the vector, transmute the lifetime back to 'static, and store it back in self.foos.

    It would look something like this (untested code):

    let foos = safe_vec_transmute(std::mem::take(&mut self.foos));
    
    foos.extend(...);
    foos.sort();
    
    // More work
    
    self.foos = safe_vec_transmute(foos);
    

    Note that safe_vec_transmute compiles (with optimizations) to exactly the same instructions as a function that clears a vector and returns it, so there is zero runtime overhead!