Search code examples
c++linuxfilesystemsrace-conditionstat

race-free directory walk (C++)


I need to walk a directory tree and get stat values for every file. I want to do this safely while the filesystem is being modified.

In Python, the best option is os.fwalk, which gives access to the fd for the directory being traversed; I can then os.stat with the dir_fd (fstatat) and get current stat values. This is as race-free as it can be made on Linux (if the contents of this directory are being modified, I may have to rescan it). In C, there is nftw, which is implemented similarly, and fts, which in glibc uses a plain (l)stat and therefore is racy (it reduces the race window by changing directories, which is inconvenient).

C++ has a new filesystem API graduated from boost, which caches stat values but doesn't expose them (and I need access to st_dev). This isn't purely a header library, so I can't work around that.

Am I missing a decent C++ option, that uses fstatat and isn't bound by Boost's ideal of not exposing platform-specific calls? Or is my best option to wrap nftw (or even find)?


Solution

  • It turns out it was simple enough to implement.

    I used libposix from dryproject.

    #include <posix++.h>
    
    class Walker {
    public:
        void walk(posix::directory dir) {
            dir.for_each([this, dir](auto& dirent) {
                if (dirent.name == "." or dirent.name == "..")
                        return;
                if (!handle_dirent(dirent))
                    return;
                struct stat stat;
                if (dirent.type == DT_DIR || dirent.type == DT_UNKNOWN) {
                    int fd = openat(
                        dir.fd(), dirent.name.c_str(), O_DIRECTORY|O_NOFOLLOW|O_NOATIME);
                    if (fd < 0) {
                        // ELOOP when O_NOFOLLOW is used on a symlink
                        if (errno == ENOTDIR || errno == ELOOP)
                            goto enotdir;
                        if (errno == ENOENT)
                            goto enoent;
                        posix::throw_error(
                            "openat", "%d, \"%s\"", dir.fd(), dirent.name);
                    }
                    posix::directory dir1(fd);
                    fstat(fd, &stat);
                    if (handle_directory(dirent, fd, stat))
                        walk(dir1);
                    close(fd);
                    return;
                }
    enotdir:
                try {
                    dir.stat(dirent.name.c_str(), stat, AT_SYMLINK_NOFOLLOW);
                } catch (const posix::runtime_error &error) {
                    if (error.number() == ENOENT)
                        goto enoent;
                    throw;
                }
                handle_file(dirent, stat);
                return;
    enoent:
                handle_missing(dirent);
            });
        }
    protected:
        /* return value: whether to stat */
        virtual bool handle_dirent(const posix::directory::entry&) { return true; }
        /* return value: whether to recurse
         * stat will refer to a directory, dirent info may be obsolete */
        virtual bool handle_directory(
                const posix::directory::entry &dirent,
                const int fd, const struct stat&) { return true; }
        /* stat might refer to a directory in case of a race;
         * it still won't be recursed into.  dirent may be obsolete. */
        virtual void handle_file(
                const posix::directory::entry &dirent,
                const struct stat&) {}
        /* in case of a race */
        virtual void handle_missing(
                const posix::directory::entry &dirent) {}
    };
    

    Performance is identical to GNU find (when comparing with the base class, using -size $RANDOM to suppress output and force find to stat all files, not just DT_DIR candidates).