Search code examples
bashfilesizedu

Discrepancy between the size of file created and size displayed by du -sh


I had to create a random file of 10GB size, which I can using dd or fallocate, but the size shown by du -sh is twice the one I created:

$ dd bs=1MB count=10000 if=/dev/zero of=foo
10000+0 records in
10000+0 records out
10000000000 bytes (10 GB, 9.3 GiB) copied, 4.78419 s, 2.1 GB/s
$ du -sh foo
19G     foo
$ ls -sh foo 
19G foo
$ fallocate -l 10G bar
$ du -sh bar
20G     bar
$ ls -sh bar
20G bar

Can someone please explain me this apparent discrepancy?


Solution

  • On wikipedia, it mentions about GPFS ...

    The system stores data on standard block storage volumes, but includes an internal RAID layer that can virtualize those volumes for redundancy and parallel access much like a RAID block storage system.

    I conclude that there is at least one non-visible duplicate for every file, and therefore each file actually uses twice the amount of space than the actual content of a single file. So the underlying RAID imposes the double-usage.

    That could explain it, because I have created a similar massive file for other purposes, also using dd, on an ext4 filesystem, but the OS reports my file size matching the dd creation size, as per design intent (no RAID in effect on my drive).

    The fact that you indicate that stat does report the correct file size as per dd's actions, confirms what I put forward above.