Search code examples
stringpointersrustunsaferaw-pointer

Rust creating String with pointer offset


So let's say I have a String, "Foo Bar" and I want to create a substring of "Bar" without allocating new memory.

So I moved the raw pointer of the original string to the start of the substring (in this case offsetting it by 4) and use the String::from_raw_parts() function to create the String.

So far I have the following code, which as far as I understand should do this just fine. I just don't understand why this does not work.

use std::mem;

fn main() {
    let s = String::from("Foo Bar");

    let ptr = s.as_ptr();

    mem::forget(s);

    unsafe {
        // no error when using ptr.add(0)
        let txt = String::from_raw_parts(ptr.add(4) as *mut _, 3, 3);

        println!("{:?}", txt); // This even prints "Bar" but crashes afterwards

        println!("prints because 'txt' is still in scope");
    }

    println!("won't print because 'txt' was dropped",)
}

I get the following error on Windows:

error: process didn't exit successfully: `target\debug\main.exe` (exit code: 0xc0000374, STATUS_HEAP_CORRUPTION)

And these on Linux (cargo run; cargo run --release):

munmap_chunk(): invalid pointer

free(): invalid pointer

I think it has something to do with the destructor of String, because as long as txt is in scope the program runs just fine.

Another thing to notice is that when I use ptr.add(0) instead of ptr.add(4) it runs without an error.

Creating a slice didn't give me any problems on the other Hand. Dropping that worked just fine.

let t = slice::from_raw_parts(ptr.add(4), 3);

In the end I want to split an owned String in place into multiple owned Strings without allocating new memory.

Any help is appreciated.


Solution

  • The reason for the errors is the way that the allocator works. It is Undefined Behaviour to ask the allocator to free a pointer that it didn't give you in the first place. In this case, the allocator allocated 7 bytes for s and returned a pointer to the first one. However, when txt is dropped, it tells the allocator to deallocate a pointer to byte 4, which it has never seen before. This is why there is no issue when you add(0) instead of add(4).

    Using unsafe correctly is hard, and you should avoid it where possible.


    Part of the purpose of the &str type is to allow portions of an owned string to be shared, so I would strongly encourage you to use those if you can.

    If the reason you can't just use &str on its own is because you aren't able to track the lifetimes back to the original String, then there are still some solutions, with different trade-offs:

    1. Leak the memory, so it's effectively static:

      let mut s = String::from("Foo Bar");
      let s = Box::leak(s.into_boxed_str());
      
      let txt: &'static str = &s[4..];
      let s: &'static str = &s[..4];
      

      Obviously, you can only do this a few times in your application, or else you are going to use up too much memory that you can't get back.

    2. Use reference-counting to make sure that the original String stays around long enough for all of the slices to remain valid. Here is a sketch solution:

      use std::{fmt, ops::Deref, rc::Rc};
      
      struct RcStr {
          rc: Rc<String>,
          start: usize,
          len: usize,
      }
      
      impl RcStr {
          fn from_rc_string(rc: Rc<String>, start: usize, len: usize) -> Self {
              RcStr { rc, start, len }
          }
      
          fn as_str(&self) -> &str {
              &self.rc[self.start..self.start + self.len]
          }
      }
      
      impl Deref for RcStr {
          type Target = str;
          fn deref(&self) -> &str {
              self.as_str()
          }
      }
      
      impl fmt::Display for RcStr {
          fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
              fmt::Display::fmt(self.as_str(), f)
          }
      }
      
      impl fmt::Debug for RcStr {
          fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
              fmt::Debug::fmt(self.as_str(), f)
          }
      }
      
      fn main() {
          let s = Rc::new(String::from("Foo Bar"));
      
          let txt = RcStr::from_rc_string(Rc::clone(&s), 4, 3);
          let s = RcStr::from_rc_string(Rc::clone(&s), 0, 4);
      
          println!("{:?}", txt); // "Bar"
          println!("{:?}", s);  // "Foo "
      }