Search code examples
rustborrow-checkerunsaferayon

Is it sound to use raw pointers to an allocated vector to allow multiple threads to write to nonoverlapping parts?


I have multiple threads doing a computation and want to collect the results into a pre-allocated vector. To turn off the borrow checker, I wrote the function:

fn set_unsync(vec: &Vec<usize>, idx: usize, val: usize) {
    let first_elem = vec.as_ptr() as *mut usize;
    unsafe { *first_elem.add(idx) = val }
}

With that we can fill a vector concurrently (e.g. using Rayon):

let vec = vec![0; 10];
(0..10).into_par_iter().for_each(|i| set_unsync(&vec, i, i));

It compiles, it works, and even Clippy likes it, but is it sound? After reading about things that appear to be sound but actually are Undefined Behavior, I'm unsure. For example, the documentation of the as_ptr method says:

The caller must also ensure that the memory the pointer (non-transitively) points to is never written to (except inside an UnsafeCell) using this pointer or any pointer derived from it.

Strictly speaking, the solution violates this. However, it feels sound to me. If it is not, how can we let multiple threads write to nonoverlapping parts of the same vector without using locks?


Solution

  • Assuming this is your minimal reproducible example:

    use rayon::prelude::*;
    
    fn set_unsync(vec: &Vec<usize>, idx: usize, val: usize) {
        let first_elem = vec.as_ptr() as *mut usize;
        unsafe { *first_elem.add(idx) = val }
    }
    
    fn main() {
    
        let input = vec![2, 3, 9];
        let output = vec![0; 100];
        input.par_iter().for_each(|&i| {
            for j in i * 10..(i + 1) * 10 {
                set_unsync(&output, j, i);
            }
        });
    
        dbg!(output);
    }
    

    If you are asking of whether this code works and always will work, then I'd answer with yes.

    BUT: it violates many rules on how safe and unsafe code should interact with each other.

    If you write a function that is not marked unsafe, you indicate that this method can be abused by users in any way possible without causing undefined behaviour (note that "users" here is not just other people, this also means your own code in safe sections). If you cannot guarantee this, you should mark it unsafe, requiring the caller of the function to mark the invocation as unsafe as well, because the caller then again has to make sure he is using your function correctly. And every point in your code that required a programmer to manually prove that it is free of undefined behaviour must require an unsafe as well. If it's possible to have sections that require a human to prove this, but do not require an unsafe, there is something unsound in your code.

    In your case, the set_unsync function is not marked unsafe, but the following code causes undefined behaviour:

    fn set_unsync(vec: &Vec<usize>, idx: usize, val: usize) {
        let first_elem = vec.as_ptr() as *mut usize;
        unsafe { *first_elem.add(idx) = val }
    }
    
    fn main() {
        let output = vec![0; 5];
    
        set_unsync(&output, 100000000000000, 42);
    
        dbg!(output);
    }
    

    Note that at no point in your main did you need an unsafe, and yet a segfault is happening here.

    Now if you say "but set_unsync is not pub, so no one else can call it. And I, in my par_iter, have ensured that I am using it correctly" - then this is the best indicator that you should mark set_unsync as unsafe. The act of "having to ensure to use the function correctly" is more or less the definition of an unsafe function. unsafe doesn't mean it will break horribly, it just means that the caller has to manually make sure that he is using it correctly, because the compiler can't. It's unsafe from the compiler's point of view.

    Here is an example of how your code could be rewritten in a more sound way. I don't claim that it is 100% sound, because I haven't thought about it enough. But I hope this demonstrates how to cleanly interface between safe and unsafe code:

    use rayon::prelude::*;
    
    // mark as unsafe, as it's possible to provide parameters that
    // cause undefined behaviour
    unsafe fn set_unsync(vec: &[usize], idx: usize, val: usize) {
        let first_elem = vec.as_ptr() as *mut usize;
    
        // No need to use unsafe{} here, as the entire function is already unsafe
        *first_elem.add(idx) = val
    }
    
    // Does not need to be marked `unsafe` as no combination of parameters
    // could cause undefined behaviour.
    // Also note that output is marked `&mut`, which is also crucial.
    // Mutating data behind a non-mutable reference is also considered undefined
    // behaviour.
    fn do_something(input: &[usize], output: &mut [usize]) {
        input.par_iter().for_each(|&i| {
            // This assert is crucial for soundness, otherwise an incorrect value
            // in `input` could cause an out-of-bounds access on `output`
            assert!((i + 1) * 10 <= output.len());
    
            for j in i * 10..(i + 1) * 10 {
                unsafe {
                    // This is the critical point where we interface
                    // from safe to unsafe code.
                    // This call requires the programmer to manually verify that
                    // `set_unsync` never gets called with dangerous parameters.
                    set_unsync(&output, j, i);
                }
            }
        });
    }
    
    fn main() {
        // note that we now have to declare output `mut`, as it should be
        let input = vec![2, 3, 9];
        let mut output = vec![0; 100];
    
        do_something(&input, &mut output);
    
        dbg!(output);
    }
    

    EDIT: This code is unsound. Please refer to @ChayimFriedman's answer instead.