Search code examples
archivetarunix

Why would a TAR file be smaller than its contents?


I have a directory I’m archiving:

$ du -sh oldcode
1400848
$ tar cf oldcode.tar oldcode

So the directory is 1.4gb. The file is significantly smaller, though:

$ ls -l oldcode.tar
-rw-r--r-- 1 ieure ieure 940339200 2002-01-30 10:33 oldcode.tar

Only 897mb. It’s not compressed in any way:

$ file oldcode.tar
oldcode.tar: POSIX tar archive

Why is the tar file smaller than its contents?


Solution

  • You get a difference because of the way the filesystem works.

    In a nutshell your disk is made out of clusters. Each cluster has a fixed size of - let's say - 4 kilobytes. If you store a 1kb file in such a cluster 3kb will be unused. The exact details vary with the kind of file-system that you use, but most file-systems work that way.

    3kb wasted space is not much for a single file, but if you have lots of very small files the waste can become a significant part of the disk usage.

    Inside the tar-archive the files are not stored in clusters but one after another. That's where the difference comes from.