Search code examples
rustffi

Rust FFI - Dangling pointer


I work on a Rust library used, through C headers, in a Swift UI.

I can read from Swift in Rust, but I can't write right away to Swift (so from Rust) what I've just read.

--

Basically, I get to convert successfully in String an *const i8 saying hello world.

But the same String fails to be handled with consistency by as_ptr() (and so being parsed as UTF-8 in Swift) =>

  1. Swift send hello world as *const i8
  2. Rust handle it through let input: &str successfully (#1 print in get_message()) => rightly prints hello world
  3. Now I can't convert this input &strto a pointer again:
  • the pointer can't be decoded by Swift
  • the "pointer encoding" changes at every call of the function (should be always the same output, as for "hello world".as_ptr())

Basically, why

  • "hello world".as_ptr() always have the same output and can be decoded by Swift
  • when input.as_ptr() has a different output every time called and can't never be decoded by Swift (where printing input rightly returns hello world)?

Do you guys have ideas?

#[derive(Debug)]
#[repr(C)]
pub struct MessageC {
    pub message_bytes: *const u8,
    pub message_len: libc::size_t,
}

/// # Safety
/// call of c_string_safe from Swift
/// => https://doc.rust-lang.org/std/ffi/struct.CStr.html#method.from_ptr
unsafe fn c_string_safe(cstring: *const i8) -> String {
    CStr::from_ptr(cstring).to_string_lossy().into_owned()
}

/// # Safety
/// call of c_string_safe from Swift
/// => https://doc.rust-lang.org/std/ffi/struct.CStr.html#method.from_ptr
/// on `async extern "C"` => <https://stackoverflow.com/a/52521592/7281870>
#[no_mangle]
#[tokio::main] // allow async function, needed to call here other async functions (not this example but needed)
pub async unsafe extern "C" fn get_message(
    user_input: *const i8,
) -> MessageC {
    let input: &str = &c_string_safe(user_input);
    println!("from Swift: {}", input); // [consistent] from Swift: hello world
    println!("converted to ptr: {:?}", input.as_ptr()); // [inconsistent] converted to ptr: 0x60000079d770 / converted to ptr: 0x6000007b40b0
    println!("directly to ptr: {:?}", "hello world".as_ptr()); // [consistent] directly to ptr: 0x1028aaf6f
    MessageC {
        message_bytes: input.as_ptr(),
        message_len: input.len() as libc::size_t,
    }
}


Solution

  • The way you construct MessageC is unsound and returns a dangling pointer. The code in get_message() is equivalent to this:

    pub async unsafe extern "C" fn get_message(user_input: *const i8) -> MessageC {
        let _invisible = c_string_safe(user_input);
        let input: &str = &_invisible;
        // let's skip the prints
        let msg = MessageC {
            message_bytes: input.as_ptr(),
            message_len: input.len() as libc::size_t,
        };
        drop(_invisible);
        return msg;
    }
    

    Hopefully this formulation highlights the issue: c_string_safe() returns an owned heap-allocated String which gets dropped (and its data deallocated) by the end of the function. input is a slice that refers to the data allocated by that String. In safe Rust you wouldn't be allowed to return a slice referring to a local variable such as input - you'd have to either return the String itself or limit yourself to passing the slice downwards to functions.

    However, you're not using safe Rust and you're creating a pointer to the heap-allocated data. Now you have a problem because as soon as get_message() returns, the _invisible String gets deallocated, and the pointer you're returning is dangling. The dangling pointer may even appear to work because deallocation is not obligated to clear the data from memory, it just marks it as available for future allocations. But those future allocations can and will happen, perhaps from a different thread. Thus a program that references freed memory is bound to misbehave, often in an unpredictable fashion - which is precisely what you have observed.

    In all-Rust code you'd resolve the issue by safely returning String instead. But you're doing FFI, so you must reduce the string to a pointer/length pair. Rust allows you to do just that, the easiest way being to just call std::mem::forget() to prevent the string from getting deallocated:

    pub async unsafe extern "C" fn get_message(user_input: *const i8) -> MessageC {
        let mut input = c_string_safe(user_input);
        input.shrink_to_fit(); // ensure string capacity == len
        let msg = MessageC {
            message_bytes: input.as_ptr(),
            message_len: input.len() as libc::size_t,
        };
        std::mem::forget(input); // prevent input's data from being deallocated on return
        msg
    }
    

    But now you have a different problem: get_message() allocates a string, but how do you deallocate it? Just dropping MessageC won't do it because it just contains pointers. (And doing so by implementing Drop would probably be unwise because you're sending it to Swift or whatever.) The solution is to provide a separate function that re-creates the String from the MessageC and drops it immediately:

    pub unsafe fn free_message_c(m: MessageC) {
        // The call to `shrink_to_fit()` above makes it sound to re-assemble
        // the string using a capacity equal to its length
        drop(String::from_raw_parts(
            m.message_bytes as *mut _,
            m.message_len,
            m.message_len,
        ));
    }
    

    You should call this function once you're done with MessageC, i.e. when the Swift code has done its job. (You could even make it extern "C" and call it from Swift.)

    Finally, using "hello world".as_ptr() directly works because "hello world" is a static &str which is baked into the executable and never gets deallocated. In other words, it doesn't point to a String, it points to some static data that comes with the program.