Search code examples
rustffiunsafemaybeuninit

What does "uninitialized" mean in the context of FFI?


I'm writing some GPU code for macOS using the metal crate. In doing so, I allocate a Buffer object by calling:

let buffer = device.new_buffer(num_bytes, MTLResourceOptions::StorageModeShared)

This FFIs to Apple's Metal API, which allocates a region of memory that both the CPU and GPU can access and the Rust wrapper returns a Buffer object. I can then get a pointer to this region of memory by doing:

let data = buffer.contents() as *mut u32

In the colloquial sense, this region of memory is uninitialized. However, is this region of memory "uninitialized" in the Rust sense?

Is this sound?

let num_bytes = num_u32 * std::mem::size_of::<u32>();
let buffer = device.new_buffer(num_bytes, MTLResourceOptions::StorageModeShared);
let data = buffer.contents() as *mut u32;

let as_slice = unsafe { slice::from_raw_parts_mut(data, num_u32) };

for i in as_slice {
  *i = 42u32;
}

Here I'm writing u32s to a region of memory returned to me by FFI. From the nomicon:

...The subtle aspect of this is that usually, when we use = to assign to a value that the Rust type checker considers to already be initialized (like x[i]), the old value stored on the left-hand side gets dropped. This would be a disaster. However, in this case, the type of the left-hand side is MaybeUninit<Box>, and dropping that does not do anything! See below for some more discussion of this drop issue.

None of the from_raw_parts rules are violated and u32 doesn't have a drop method.

  • Nonetheless, is this sound?
  • Would reading from the region (as u32s) before writing to it be sound (nonsense values aside)? The region of memory is valid and u32 is defined for all bit patterns.

Best practices

Now consider a type T that does have a drop method (and you've done all the bindgen and #[repr(C)] nonsense so that it can go across FFI boundaries).

In this situation, should one:

  • Initialize the buffer in Rust by scanning the region with pointers and calling .write()?
  • Do:
let as_slice = unsafe { slice::from_raw_parts_mut(data as *mut MaybeUninit<T>, num_t) };

for i in as_slice {
  *i = unsafe { MaybeUninit::new(T::new()).assume_init() };
}

Furthermore, after initializing the region, how does the Rust compiler remember this region is initialized on subsequent calls to .contents() later in the program?

Thought experiment

In some cases, the buffer is the output of a GPU kernel and I want to read the results. All the writes occurred in code outside of Rust's control and when I call .contents(), the pointer at the region of memory contains the correct uint32_t values. This thought experiment should relay my concern with this.

Suppose I call C's malloc, which returns an allocated buffer of uninitialized data. Does reading u32 values from this buffer (pointers are properly aligned and in bounds) as any type should fall squarely into undefined behavior.

However, suppose I instead call calloc, which zeros the buffer before returning it. If you don't like calloc, then suppose I have an FFI function that calls malloc, explicitly writes 0 uint32_t types in C, then returns this buffer to Rust. This buffer is initialized with valid u32 bit patterns.

  • From Rust's perspective, does malloc return "uninitialized" data while calloc returns initialized data?
  • If the cases are different, how would the Rust compiler know the difference between the two with respect to soundness?

Solution

  • There are multiple parameters to consider when you have an area of memory:

    • The size of it is the most obvious.
    • Its alignment is still somewhat obvious.
    • Whether or not it's initialized -- and notably, for types like bool whether it's initialized with valid values as not all bit-patterns are valid.
    • Whether it's concurrently read/written.

    Focusing on the trickier aspects, the recommendation is:

    • If the memory is potentially uninitialized, use MaybeUninit.
    • If the memory is potentially concurrently read/written, use a synchronization method -- be it a Mutex or AtomicXXX or ....

    And that's it. Doing so will always be sound, no need to look for "excuses" or "exceptions".

    Hence, in your case:

    let num_bytes = num_u32 * std::mem::size_of::<u32>();
    assert!(num_bytes <= isize::MAX as usize);
    
    let buffer = device.new_buffer(num_bytes, MTLResourceOptions::StorageModeShared);
    
    let data = buffer.contents() as *mut MaybeUninit<u32>;
    
    //  Safety:
    //  - `data` is valid for reads and writes.
    //  - `data` points to `num_u32` elements.
    //  - Access to `data` is exclusive for the duration.
    //  - `num_u32 * size_of::<u32>() <= isize::MAX`.
    let as_slice = unsafe { slice::from_raw_parts_mut(data, num_u32) };
    
    for i in as_slice {
        i.write(42);  //  Yes you can write `*i = MaybeUninit::new(42);` too,
                      //  but why would you?
    }
    
    // OR with nightly:
    
    as_slice.write_slice(some_slice_of_u32s);