Search code examples
rustunsafetype-erasureraw-pointer

Is it safe to clone a type-erased Arc via raw pointer?


I'm in a situation where I'm working with data wrapped in an Arc, and I sometimes end up using into_raw to get the raw pointer to the underlying data. My use case also calls for type-erasure, so the raw pointer often gets cast to a *const c_void, then cast back to the appropriate concrete type when re-constructing the Arc.

I've run into a situation where it would be useful to be able to clone the Arc without needing to know the concrete type of the underlying data. As I understand it, it should be safe to reconstruct the Arc with a dummy type solely for the purpose of calling clone, so long as I never actually dereference the data. So, for example, this should be safe:

pub unsafe fn clone_raw(handle: *const c_void) -> *const c_void {
    let original = Arc::from_raw(handle);
    let copy = original.clone();
    mem::forget(original);
    Arc::into_raw(copy)
}

Is there anything that I'm missing that would make this actually unsafe? Also, I assume the answer would apply to Rc as well, but if there are any differences please let me know!


Solution

  • This is almost always unsafe.

    An Arc<T> is just a pointer to a heap-allocated struct which roughly looks like

    struct ArcInner<T: ?Sized> {
        strong: atomic::AtomicUsize,
        weak: atomic::AtomicUsize,
        data: T,  // You get a raw pointer to this element
    }
    

    into_raw() gives you a pointer to the data element. The implementation of Arc::from_raw() takes such a pointer, assumes that it's a pointer to the data-element in an ArcInner<T>, walks back in memory and assumes to find an ArcInner<T> there. This assumption depends on the memory-layout of T, specifically it's alignment and therefore it's exact placement in ArcInner.

    If you call into_raw() on an Arc<U> and then call from_raw() as if it was an Arc<V> where U and V differ in alignment, the offset-calculation of where U/V is in ArcInner will be wrong and the call to .clone() will corrupt the data structure. Dereferencing T is therefore not required to trigger memory unsafety.

    In practice, this might not be a problem: Since data is the third element after two usize-elements, most T will probably be aligned the same way. However, if the stdlib-implementation changes or you end up compiling for a platform where this assumption is wrong, reconstructing an Arc<V>::from_raw that was created by an Arc<U> where the memory layout of V and U is different will be unsafe and crash.


    Update:

    Having thought about it some more I downgrade my vote from "might be safe, but cringy" to "most likely unsafe" because I can always do

    #[repr(align(32))]
    struct Foo;
    
    let foo = Arc::new(Foo);
    

    In this example Foo will be aligned to 32 bytes, making ArcInner<Foo> 32 bytes in size (8+8+16+0) while a ArcInner<()> is just 16 bytes (8+8+0+0). Since there is no way to tell what the alignment of T is after the type has been erased, there is no way to reconstruct a valid Arc.

    There is an escape hatch that might be safe in practice: By wrapping T into another Box, the layout of ArcInner<T> is always the same. In order to force this upon any user, you can do something like

    struct ArcBox<T>(Arc<Box<T>>)
    

    and implement Deref on that. Using ArcBox instead of Arc forces the memory layout of ArcInner to always be the same, because T is behind another pointer. This, however, means that all access to T requires a double dereference, which might badly affect performance.