Search code examples
multithreadingrustdirectoryposixreaddir

Why is fs::read_dir() thread safe on POSIX platforms


Some Background

Originally, Rust switched from readdir(3) to readdir_r(3) for thread safety. But readdir_r(3) has some problems, then they changed it back:

So, in the current implementation, they use readdir(3) on most POSIX platforms

#[cfg(any(
    target_os = "android",
    target_os = "linux",
    target_os = "solaris",
    target_os = "fuchsia",
    target_os = "redox",
    target_os = "illumos"
))]
fn next(&mut self) -> Option<io::Result<DirEntry>> {
unsafe {
    loop {
    // As of POSIX.1-2017, readdir() is not required to be thread safe; only
    // readdir_r() is. However, readdir_r() cannot correctly handle platforms
    // with unlimited or variable NAME_MAX.  Many modern platforms guarantee
    // thread safety for readdir() as long an individual DIR* is not accessed
    // concurrently, which is sufficient for Rust.
    super::os::set_errno(0);
    let entry_ptr = readdir64(self.inner.dirp.0);
Thread issue of readdir(3)

The problem of readdir(3) is that its return value (struct dirent *) is a pointer pointing to the internal buffer of the directory stream (DIR), thus can be overwritten by the following readdir(3) calls. So if we have a DIR stream, and share it with multiple threads, with all threads calling readdir(3), which is a race condition.

If we want to safely handle this, an external synchronization is needed.

My question

Then I am curious about what Rust did to avoid such issues. Well, it seems that they just call readdir(3), memcpy the return value to their caller-allocated buffer, and then return. But this function is not marked as unsafe, this makes me confused.

So my question is why is it safe to call fs::read_dir() in multi-threaded programs?

There is a comment stating that it is safe to use it in Rust without extra external synchronization, but I didn't get it...

It requires external synchronization if a particular directory stream may be shared among threads, but I believe we avoid that naturally from the lack of &mut aliasing. Dir is Sync, but only ReadDir accesses it, and only from its mutable Iterator implementation.


OP's edit after 3 months

At the time of writing this question, I was not familiar with multi-threaded programming in Rust. After refining my skill, taking another look at this post makes me realize that it is pretty easy to verify this question:

// With scpped threads
// Does not compile since we can not mutably borrow pwd more than once
use std::{
    fs::read_dir,
    thread::{scope, spawn},
};

fn main() {
    let mut pwd = read_dir(".").unwrap();

    scope(|s| {
        for _ in 1..10 {
            s.spawn(|| {
                let entry = pwd.next().unwrap().unwrap();
                println!("{:?}", entry.file_name());
            });
        }
    })
}
// Use interior mutability to share it with multiple threads
// This code does compile because synchronization is applied (RwLock)
use std::{
    fs::read_dir,
    sync::{Arc, RwLock},
    thread::spawn,
};

fn main() {
    let pwd = Arc::new(RwLock::new(read_dir(".").unwrap()));

    for _ in 1..10 {
        spawn({
            let pwd = Arc::clone(&pwd);
            move || {
                let entry = pwd.write().unwrap().next().unwrap().unwrap();
                println!("{:?}", entry.file_name());
            }
        }).join().unwrap();
    }
}

Solution

  • readdir is not safe when called from multiple threads with the same DIR* dirp parameter (i.e. with the same self.inner.dirp.0 in the Rust case) but it may be called safely with different dirps. Since calling ReadDir::next requires a &mut self, it is guaranteed that nobody else can call it from another thread at the same time on the same ReadDir instance, and so it is safe.