Search code examples
filesizedu

du summary isn't equal to sum of the elements


The directory contains normal files in normal directories, no symlinks and remote fs (it's actually a maildir++ storage, so not even sparse files are expected). I don't readily see how it's possible that the summary of the directory sizes is significantly larger than the total du provides:

$ du * .[a-zA-Z]* -bsc | tail -n1
2722800257      total

$ du * .[a-zA-Z]* -b | awk '{sum+=$1} END {print sum}'
3341577554

Reality seems to match the larger number.


Solution

  • Your second command du -b ... | awk ... is overstating the total because it counts subdirectory sizes multiple times. Each subdirectory size is counted on its own, then counted again as part of the size of each of its ancestor directories.

    It's easier to see what's happening in a small example like this, on a filesystem where an empty directory happens to consume 4KB:

    $ mkdir -p foo/bar/baz
    
    $ du -bsc foo
    12288   foo
    12288   total
    
    $ du -b foo
    4096    foo/bar/baz
    8192    foo/bar
    12288   foo
    
    $ du -b foo | awk '{t += $1} END {print t}'
    24576