Imagine a directory tree (on Linux):
user@computer:~/demo> find .
.
./test1
./test1/test1_a
./test1/test1_a/somefile_1a
./test1/test1_b
./test1/test1_b/somefile_1b
./test0
./test0/test0_a
./test0/test0_a/somefile_0a
./test0/test0_b
./test0/test0_b/somefile_0b
Scenario: I determine all available meta info about every directory and file in that tree (mtime, ctime, inode, size, checksums on file contents ...), including the highest-level directory, demo
. I store this information. Then, some file/s or directory/ies is/are changed (literally changed or newly created or deleted). Using the previously determined and stored information, I now want to figure out what has changed.
My solution so far: I traverse the entire tree, then look for changed meta information, then process it. Above a certain size, traversing a tree and looking at every directory and file becomes quite time consuming - even if you look at pure meta info only (i.e. ctime, mtime etc, NOT file content checksums). One can optimize such a traversal only to a certain degree (e.g. read meta info on files and folders actually only once during a traversal instead of multiple times etc) - at the end of the day I/O speed becomes the bottleneck.
Question: What options do I have (on Unix / Linux file systems) to look for changes in my tree without traversing all of it? I.e. is there any information stored for demo
which tells me / indicates in some way that something below it (e.g. somefile_1b
) has been changed? Are there any specific filesystems (EXT*, XFS, ZFS, ...) offering features of this kind?
Note: I am aware of the option of running a background process for monitoring changes to the filesystem. It would eliminate the need for a full traversal of my tree, though I am more interested in options which do NOT require a background monitoring process (if an option of this kind exists at all).
ZFS provides the capability via zfs diff ...
Per the Oracle Solaris 11.2 documentation:
Identifying ZFS Snapshot Differences (zfs diff)
You can determine ZFS snapshot differences by using the
zfs diff
command.For example, assume that the following two snapshots are created:
$ ls /tank/home/tim fileA $ zfs snapshot tank/home/tim@snap1 $ ls /tank/home/tim fileA fileB $ zfs snapshot tank/home/tim@snap2
For example, to identify the differences between two snapshots, use syntax similar to the following:
$ zfs diff tank/home/tim@snap1 tank/home/tim@snap2 M /tank/home/tim/ + /tank/home/tim/fileB
In the output, the M indicates that the directory has been modified. The + indicates that fileB exists in the later snapshot.
The R in the following output indicates that a file in a snapshot has been renamed.
$ mv /tank/cindy/fileB /tank/cindy/fileC $ zfs snapshot tank/cindy@snap2 $ zfs diff tank/cindy@snap1 tank/cindy@snap2 M /tank/cindy/ R /tank/cindy/fileB -> /tank/cindy/fileC
This does only compare between two snapshots, so you do have to have the ability to create ZFS snapshots to use this effectively.