Search code examples
linuxunixfilesystemsinodedirectory-tree

How to determine whether files have been changed in a directory tree without traversing the entire tree?


Imagine a directory tree (on Linux):

user@computer:~/demo> find .
.
./test1
./test1/test1_a
./test1/test1_a/somefile_1a
./test1/test1_b
./test1/test1_b/somefile_1b
./test0
./test0/test0_a
./test0/test0_a/somefile_0a
./test0/test0_b
./test0/test0_b/somefile_0b

Scenario: I determine all available meta info about every directory and file in that tree (mtime, ctime, inode, size, checksums on file contents ...), including the highest-level directory, demo. I store this information. Then, some file/s or directory/ies is/are changed (literally changed or newly created or deleted). Using the previously determined and stored information, I now want to figure out what has changed.

My solution so far: I traverse the entire tree, then look for changed meta information, then process it. Above a certain size, traversing a tree and looking at every directory and file becomes quite time consuming - even if you look at pure meta info only (i.e. ctime, mtime etc, NOT file content checksums). One can optimize such a traversal only to a certain degree (e.g. read meta info on files and folders actually only once during a traversal instead of multiple times etc) - at the end of the day I/O speed becomes the bottleneck.

Question: What options do I have (on Unix / Linux file systems) to look for changes in my tree without traversing all of it? I.e. is there any information stored for demo which tells me / indicates in some way that something below it (e.g. somefile_1b) has been changed? Are there any specific filesystems (EXT*, XFS, ZFS, ...) offering features of this kind?

Note: I am aware of the option of running a background process for monitoring changes to the filesystem. It would eliminate the need for a full traversal of my tree, though I am more interested in options which do NOT require a background monitoring process (if an option of this kind exists at all).


Solution

  • ZFS provides the capability via zfs diff ... Per the Oracle Solaris 11.2 documentation:

    Identifying ZFS Snapshot Differences (zfs diff)

    You can determine ZFS snapshot differences by using the zfs diff command.

    For example, assume that the following two snapshots are created:

    $ ls /tank/home/tim
    fileA
    $ zfs snapshot tank/home/tim@snap1
    $ ls /tank/home/tim
    fileA  fileB
    $ zfs snapshot tank/home/tim@snap2
    

    For example, to identify the differences between two snapshots, use syntax similar to the following:

    $ zfs diff tank/home/tim@snap1 tank/home/tim@snap2
    M       /tank/home/tim/
    +       /tank/home/tim/fileB
    

    In the output, the M indicates that the directory has been modified. The + indicates that fileB exists in the later snapshot.

    The R in the following output indicates that a file in a snapshot has been renamed.

    $ mv /tank/cindy/fileB /tank/cindy/fileC
    $ zfs snapshot tank/cindy@snap2
    $ zfs diff tank/cindy@snap1 tank/cindy@snap2
    M       /tank/cindy/
    R       /tank/cindy/fileB -> /tank/cindy/fileC
    

    This does only compare between two snapshots, so you do have to have the ability to create ZFS snapshots to use this effectively.