Search code examples
rustautomatic-ref-countingunsafe

Is there a sound way to convert/transmute from Arc<String> to Arc<Vec<u8>>


String and Vec<u8> have the same memory layout, although this is not guaranteed.

String also has an into_bytes method returning a Vec<u8>.

Is there a sound way to convert from an Arc<String> to an Arc<Vec<u8>> without allocating any memory for a new Arc? We can assert that the arc refcount is 1. I don't mind using unsafe. I also don't mind asserting that they have the same size / falling back to alloc in that case. Is this possible?


Solution

  • @eggyal's approach is unfortunately fundamentally unsound with Stacked Borrows. Here is a somewhat different approach that is sound (I hope):

    use std::mem::{align_of, size_of};
    use std::sync::Arc;
    
    // We can check that size and alignment match at compile-time, which saves us from
    // performing these checks at runtime.
    const _: () = {
        assert!(size_of::<String>() == size_of::<Vec<u8>>());
        assert!(align_of::<String>() == align_of::<Vec<u8>>());
    };
    
    pub fn convert(mut arc: Arc<String>) -> Arc<Vec<u8>> {
        Arc::get_mut(&mut arc).unwrap();
    
        let raw_string = Arc::into_raw(arc).cast_mut();
        let raw_bytes = raw_string.cast::<Vec<u8>>();
    
        // SAFETY: We don't drop the returned string, panic will just drop the string
        // but leak the `Arc` (which is why there won't be a double drop),
        // and we just transform the string and write it back.
        let string = unsafe { raw_string.read() };
    
        let bytes = string.into_bytes();
    
        // SAFETY:
        //  - We are the only one pointing to this `Arc`, so no data race can occur.
        //  - `String` and `Vec<u8>` have the same size and alignment, so the write is in bounds.
        //  - We converted the `String` to `Vec` using its methods, so we know it is valid.
        unsafe { raw_bytes.write(bytes) };
    
        // SAFETY: We transformed the `String` into `Vec<u8>`, so it is now initialized as `Vec<u8>`.
        // And we didn't involve reference at all, so we have no aliasing problems.
        unsafe { Arc::from_raw(raw_bytes) }
    }