Search code examples
rubyterminalfilesizelsdu

Summing total file sizes of directory is different by a large margin: Ruby -e, du -ach, ls -al "total"


ls | ruby -ne 'BEGIN{a= []}; a <<  File.size($_.chomp).to_i; END{puts a.sum}'

The code above gets the file size of each file, puts it into an array, and prints the sum.

The value returned is very different from:

du -ach

And both values are very different from the Total displayed by:

ls -al

There are no hidden files.

MacOs


Solution

  • If du is showing you a lot of 4K and 8K files, this is because it is showing you the block size. For performance, storage on disk is made up of blocks. A typical block these days is 4K. Even a single byte will take a full block.

    $ echo '1' > this
    
    $ hexdump this
    0000000 31 0a                                          
    0000002
    
    $ ls -l this
    -rw-r--r-- 1 schwern staff 2 Dec  5 15:16 this
    
    $ du -h this
    4.0K    this
    
    $ du --apparent-size -h this
    2   this
    
    $ ruby -e 'puts File.size(ARGV[0])' this
    2
    

    The file in question has 2 bytes of content. ls -l and File.size report the content of two bytes.

    du, by default, reports the block size of the file. This is because it is a Disk Usage tool and you want to know the true amount of disk taken up. Those 2 bytes take up 4K of disk. 1000 2 byte files will take 4000K, not 2000 bytes.

    For this reason, many programs will avoid having many tiny files and instead save disk space by packing them together into a single image file. A simple example is Git packfiles.