Search code examples
rustmemory-managementstatic-analysisborrow-checkermutable

Can I Prove Monotonicity of Allocations to the Rust Borrow Checker


I have the following code which does not compile:

// TODO: Return Result, remove `.expect`s
fn to_blender_subfile<'a>(
    filepath: &str,
    transform: Transform,
    visited_file_cache: &'a mut HashMap<&'a str, Vec<Command>>,
    blender: &mut Blender,
) {
    // TODO: Support colors and backface culling

    let commands = visited_file_cache.get(filepath).unwrap();
    // "unwrap" is enforced by the wrapper function

    for command in commands.iter() {
        if let Command::Subfile { filepath, .. } = command {
            if !visited_file_cache.contains_key(filepath.as_str()) {
                let mut file = std::fs::File::open(filepath).expect("Failed to open file");

                let mut file_contents = String::new();

                let contents_len = file
                   .read_to_string(&mut file_contents)
                   .expect("Failed to read file into buffer");

                assert_eq!(contents_len, file_contents.len()); // TODO: Check this


                // "parse_file" is defined elsewhere
                let commands = parse_file(&file_contents);

                if let Some(_) = visited_file_cache.insert(filepath.as_str(), commands) {
                    std::unreachable!();
                }
            }
        }
    }

    // ... rest of function
}

The error I get from rustc is:

cannot borrow `*visited_file_cache` as mutable because it is also borrowed as immutable 

referring to the fact that I both do visited_file_cache.insert(filepath.as_str(), commands) and let commands = visited_file_cache.get(filepath).unwrap(); while claiming visited_file_cache: &'a mut HashMap<&'a str, Vec<Command>>.

My understanding of what's going on, at least at a high level, is that the borrow checker is protecting me from mutating visited_file_cache while also referring to its values for its keys, since in general its keys may survive its values which can lead to dangling pointers.

This is all well and good, however by the nature of the code, if a key-value pair has been inserted into visited_file_cache it cannot be removed (i.e. it lives at least as long as the hashmap itself). I understand that the compiler cannot use this information, hence why it complains, but is there a way to express to the compiler that a region grows monotonically? I could then tell the compiler that visited_file_cache is backed by a monotonic allocator and its insert method is nondestructive, thus getting rid of the above error.

Can the compiler understand such structures, or are there predefined structures in the standard library that enforce the above behavior while using unsafe?

I am aware that I can trivially circumvent this issue by copying strings and making the keys of visited_file_cache owned, and I'm pretty sure I could also use reference counting. However, I don't think the infrastructure for either of those two solutions is necessary for my use case.

N.B. I am new to Rust (haven't looked the Rust Book yet, though I have looked at the Rustonomicon), but I have a C and Haskell background.


Solution

  • First of all, if you only want the code to compile, then indeed you can relatively easily get there with a bit of cloning & owned keys: see the playground.

    It is a bit boring, though, isn't it?

    So, what do we need to be able to iterate & refer to the key & values of a map while modifying the map?

    Leak it, leak it, leak it!

    (Not be mistaken with "lick it, lick it, lick it!", no geologist was hurt in the making of this answer)

    In Rust, it's safe to leak. In fact, there's an explicit function to do so. So we're going to leak all things:

    • The keys will be &'static str.
    • The commands will be &'static [Command].

    Leaking a String can be achieved by going:

    let filename: Box<str> = _string_;
    let filename = Box::leak(filename);
    

    And leaking a slice can be achieved by going:

    let commands: Box<[Command]> = _vec_;
    let commands = Box::leak(commands);
    

    And that's it. We can now have a HashMap<&'static str, &'static [Command]> where filepath: &'static str in Command::Subfile and we're good to go.

    Tweak it.

    Leaking is safe, so it's cool, but... it doesn't really lend itself to repetition. So we need another option.

    Another option would be for the map to guarantee that inserted keys & values will (1) not be moved and (2) not be mutated (while shared).

    What's pretty cool about Rust is that you can, in fact, expressed such invariants (shared vs not-shared) at the type-level, so we get:

    impl<K, V> PushMap<K, V> {
        //  Only reads & `insert_if_new` are possible for the duration of the borrow.
        fn get(&self, key: &K) -> Option<&V>;
    
        //  Returns `value` if the key is already present in the map.
        //
        //  This is unlike `insert` (which returns the previous value), hence the
        //  the different name.
        fn insert_if_new(&self, key: K, value: V) -> Option<(K, V)>;
    
        //  Mirror API of standard HashMap.
        fn get_mut(&mut self, key: &K) -> Option<&mut V>;
        fn insert(&mut self, key: K, value: V) -> Option<V>;
        fn remove(&mut self, key: &K) -> Option<V>;
    }
    

    The use of insert_new does require internal mutability. At the bottom this means UnsafeCell, and the Cell and RefCell wrappers won't cut it here, so we'll need to make our own.

    This also means that it will NOT be possible to shared &PushMap across multiple threads (while it's possible with &HashMap) unless we also make it a concurrent map. It's a trade-off.

    The exact implementation is left as an exercise to the reader. From the literature, a typical hash table implementation, with linked-lists as buckets, will fit the bill, although there's overhead in using linked-lists. Otherwise, an implementation atop the Jagged Array concept is also possible, see the jagged repository for an example.