I need to walk a directory tree and get stat values for every file. I want to do this safely while the filesystem is being modified.
In Python, the best option is os.fwalk
, which gives access to the fd for the directory being traversed; I can then os.stat
with the dir_fd (fstatat
) and get current stat values. This is as race-free as it can be made on Linux (if the contents of this directory are being modified, I may have to rescan it). In C, there is nftw
, which is implemented similarly, and fts
, which in glibc uses a plain (l)stat and therefore is racy (it reduces the race window by changing directories, which is inconvenient).
C++ has a new filesystem API graduated from boost, which caches stat
values but doesn't expose them (and I need access to st_dev). This isn't purely a header library, so I can't work around that.
Am I missing a decent C++ option, that uses fstatat
and isn't bound by Boost's ideal of not exposing platform-specific calls? Or is my best option to wrap nftw
(or even find
)?
It turns out it was simple enough to implement.
I used libposix from dryproject.
#include <posix++.h>
class Walker {
public:
void walk(posix::directory dir) {
dir.for_each([this, dir](auto& dirent) {
if (dirent.name == "." or dirent.name == "..")
return;
if (!handle_dirent(dirent))
return;
struct stat stat;
if (dirent.type == DT_DIR || dirent.type == DT_UNKNOWN) {
int fd = openat(
dir.fd(), dirent.name.c_str(), O_DIRECTORY|O_NOFOLLOW|O_NOATIME);
if (fd < 0) {
// ELOOP when O_NOFOLLOW is used on a symlink
if (errno == ENOTDIR || errno == ELOOP)
goto enotdir;
if (errno == ENOENT)
goto enoent;
posix::throw_error(
"openat", "%d, \"%s\"", dir.fd(), dirent.name);
}
posix::directory dir1(fd);
fstat(fd, &stat);
if (handle_directory(dirent, fd, stat))
walk(dir1);
close(fd);
return;
}
enotdir:
try {
dir.stat(dirent.name.c_str(), stat, AT_SYMLINK_NOFOLLOW);
} catch (const posix::runtime_error &error) {
if (error.number() == ENOENT)
goto enoent;
throw;
}
handle_file(dirent, stat);
return;
enoent:
handle_missing(dirent);
});
}
protected:
/* return value: whether to stat */
virtual bool handle_dirent(const posix::directory::entry&) { return true; }
/* return value: whether to recurse
* stat will refer to a directory, dirent info may be obsolete */
virtual bool handle_directory(
const posix::directory::entry &dirent,
const int fd, const struct stat&) { return true; }
/* stat might refer to a directory in case of a race;
* it still won't be recursed into. dirent may be obsolete. */
virtual void handle_file(
const posix::directory::entry &dirent,
const struct stat&) {}
/* in case of a race */
virtual void handle_missing(
const posix::directory::entry &dirent) {}
};
Performance is identical to GNU find (when comparing with the base class, using -size $RANDOM
to suppress output and force find
to stat
all files, not just DT_DIR
candidates).