Search code examples
rustlanguage-lawyerlifetimeunsafe

Is it legal to store a 'static reference to struct internal data in unsafe Rust?


I have the following data structure (simplified):

use std::collections::HashMap;
pub struct StringCache {
    // the hashmap keys point to elements stored in `storage`
    // so never outlive this datastructure. the 'static refs are never handed out
    table: HashMap<&'static str, usize>,

    storage: Vec<Box<str>>,
}

Is it legal/defined behaviour to "lie" to Rust about the lifetime of the references here? It feels like violating the type system to me. Is the public API of this data structure sound? For completeness, here is the full implementation:

use std::mem::transmute;
impl StringCache {
    pub fn intern(&mut self, entry: &str) -> usize {
        if let Some(key) = self.table.get(entry) {
            return *key;
        }
        let idx = self.storage.len();
        self.storage.push(entry.to_owned().into_boxed_str());
        // we cast our refs to 'static here.
        let key = unsafe { transmute::<&str, &'static str>(&self.storage[idx]) };
        self.table.insert(key, idx);
        idx
    }
    pub fn lookup(&self, idx: usize) -> &str {
        &self.storage[idx]
    }
}

Solution

  • TL;DR: This is fine.


    There are three potential problems here:

    1. Creating a reference with lifetime longer than its real lifetime.
    2. Moving the type may invalidate the references.
    3. While dropping the data we may use the strings after they're dropped.

    And none of them is actually a problem. Let's start from the first.

    Creating a reference with a lifetime longer than its actual lifetime is definitely fine. A lot of code in the wild relies of it to be fine, it is not listed under "Behavior considered undefined" in the reference (even though that list is not exhaustive), all current models for code execution in Rust (such as Stacked Borrows or MiniRust) employ this principle (in fact, the very reason for the existence of Stacked Borrows is to not rely on lifetimes for soundness and rather have more fine-grained model), and in UCG#231 it was stated that it is obvious that lifetimes cannot affect optimizations, just not specified currently somewhere.

    So we're coming to the second issue. The question is whether moving StringCache (and therefore, its storage) will invalidate the references because it will move the Vec's elements too. Or with more technical terms, whether moving a Vec retags (a Stacked Borrows term) its elements, asserting their uniqueness.

    Your intuition may say that this is fine, but reality is more complicated. Vec defines its items with Unique, which means that by the letter of the law moving it does invalidate all existing pointers to elements (this is true about Box too). However, lots of code in the wild relies on this to be false, so at least for Vec (and perhaps for Box too) we probably want this to be false. I think it is okay to rely on that. Miri does not give Vec any special treatement either, and as far as I'm aware, the compiler does not optimize based on it too.

    For (3), your current definition (that declares table before storage) is obviously fine because it drops the HashMap first, but even your previous definition (that declared storage first) was fine, because HashMap is declared with #[may_dangle], meaning it promises it will not access its elements (aside from dropping them) in its drop. This is also a stability guarantee, because it is observable, even in code very similar to yours:

    use std::collections::HashMap;
    
    #[derive(Default)]
    struct StringCache<'a> {
        storage: Vec<Box<str>>,
        table: HashMap<&'a str, usize>,
    }
    
    fn main() {
        let mut string_cache = StringCache::default();
        string_cache.storage.push("hello".into());
        string_cache.table.insert(&string_cache.storage[0], 0);
    }
    

    The fact that this code compiles successfully is because HashMap promises to not touch its elements during drop. Otherwise, we could have use-after-free. So HashMap cannot change this fact suddenly.