Search code examples
windowsubunturustmemory-managementunsafe

What is the issue in this unsafe Rust code so it works on Windows, but not on the Ubuntu?


Hello, I know the code could be fully written without any unsafe code, but I am doing a research and learning how things work "under the hood".

Back to the topic, I've written a piece of unsafe Rust code that in my opinion should work without any issues.

This is the definition:

pub struct Container {
    inner: Pin<Box<String>>,
    half_a: *const str,
    half_b: *const str,
}

impl Container {
    const SEPARATOR: char = '-';

    pub fn new(input: impl AsRef<str>) -> Option<Self> {
        let input = input.as_ref();
        if input.is_empty() {
            return None
        }

        // Making sure the value is never moved in the memory
        let inner = Box::pin(input.to_string());

        let separator_index = inner.find(Container::SEPARATOR)?;
        let inner_ref = &**inner;

        let half_a = &inner_ref[0..separator_index];
        let half_b = &inner_ref[separator_index+1..];

        // Check if the structure definition is met (populated values + only one separator)
        if half_a.is_empty() || half_b.is_empty() || half_b.contains(Container::SEPARATOR) {
            return None;
        }

        Some(Self {
            half_a: half_a as *const str,
            half_b: half_b as *const str,
            inner,
        })
    }
    
    pub fn get_half_a(&self) -> &str {
        unsafe {
            &*self.half_a
        }
    }

    pub fn get_half_b(&self) -> &str {
        unsafe {
            &*self.half_b
        }
    }
}

In summary it accepts any input that can be represented as an str reference, creates a pinned clone of the input on the heap, gets addresses that point to both halfs of this value and returns this as a structure.

Now when I do a tests:

let valid = Container::new("first-second").unwrap();
assert_eq!(valid.get_half_a(), "first");
assert_eq!(valid.get_half_b(), "second");

It should run without any panics and indeed that's what happens on Windows. It compiles and runs without any issues multiple times, but when it is run on Ubuntu I am getting an error showing that the addresses no longer point to a valid place in memory:

 thread 'tests::types::container' panicked at 'assertion failed: `(left == right)`
  left: `"�K\u{13}϶"`,
 right: `"first"`', research/src/tests/types.rs:77:5

What could be the issue here? Did I miss something? I am running this code as GitHub action with the following flag runs-on: ubuntu-latest.

Here is an URL to the playground showcasing that this code runs without any issues: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=d36b19de4d0fa05340191f5107029d75

I've expected no issues from running this code on a different OS.


Solution

  • Changing Box<String> to Box<str>, which shouldn't affect the soundness, triggers MIRI.

    error: Undefined Behavior: trying to retag from <2563> for SharedReadOnly permission at alloc890[0x0], but that tag does not exist in the borrow stack for this location
      --> src/main.rs:41:18
       |
    41 |         unsafe { &*self.half_a }
       |                  ^^^^^^^^^^^^^
       |                  |
       |                  trying to retag from <2563> for SharedReadOnly permission at alloc890[0x0], but that tag does not exist in the borrow stack for this location
       |                  this error occurs as part of retag at alloc890[0x0..0x5]
       |
       = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
       = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
    help: <2563> was created by a SharedReadOnly retag at offsets [0x0..0x5]
      --> src/main.rs:34:21
       |
    34 |             half_a: half_a as *const str,
       |                     ^^^^^^
    help: <2563> was later invalidated at offsets [0x0..0xc] by a Unique retag (of a reference/box inside this compound value)
      --> src/main.rs:36:13
       |
    36 |             inner,
       |             ^^^^^
       = note: BACKTRACE (of the first span):
       = note: inside `Container::get_half_a` at src/main.rs:41:18: 41:31
    note: inside `main`
      --> src/main.rs:51:16
       |
    51 |     assert_eq!(valid.get_half_a(), "first");
       |                ^^^^^^^^^^^^^^^^^^
    

    This comes from Box, which cannot be aliased. While it's normally fine to derive pointers from Box, when you move the Box (by returning Container), Rust no longer knows that the Box has had pointers derived from it, and assumes accesses through the pointers are invalid due to aliasing.

    That's why MIRI is triggered. However, I'm not certain what makes this undefined behavior. Your test results suggest it is, but can't tell you why. My guess is that Rust decides inner can be dropped as soon as new returns, since it's guaranteed to be unique. It may even optimize the allocation to never actually write any data (the pointer, length, and capacity of String in your version), since that data is never read, which would explain your runtime error.

    You can fix this by storing pointers only, and implementing Drop. (playground)

    pub struct Container {
        inner: *mut str,
        half_a: *const str,
        half_b: *const str,
    }
    
    impl Drop for Container {
        fn drop(&mut self) {
            // SAFETY: Nothing references this value since it is being dropped,
            // and `half_a` and `half_b` are never read after this.
            unsafe { drop(Box::from_raw(self.inner)) }
        }
    }
    

    I don't think Pin does anything for soundness here. Pin is used more in dealing with public interfaces. As long as you don't hand out any &mut references to inner, there's nothing to guard against. While you might want it for internal guarantees, your real guarantees are stronger than Pin since you can't use the value at all.