Search code examples
javaspring-bootapache-commonsdu

How to get the real size of a directory used on the disk in Java?


I need to get the real size used by a directory in Java. I am running Linux.

Today I have

FileUtils.sizeOf (path)

But this does only summarize the bytes of all files into the directory. As far I see it ignores the the factor of block-size and directory-entries.

If there are a lot of very small files in a lot of subdirectories, then the output of this FileUtils.sizeOf differs very from the real usage (e.g. by "du" command).

In my sample there is a factor of 100 and above.

Is there some Lib for that?? Otherwise I need to use the consoles "du" command.


Solution

  • There isn't anything in java itself, so you'd have to use ProcessBuilder to invoke du. However, du lies just the same.

    The concept of 'actual occupied disk space size' is vague. That sounds 'weird' - disks are simple right? They store files, and files have a size. The way the file system stores files might mean that it smears a file over blocks, and cannot store more than 1 filesegment in a block, thus, there's the additional concept of 'rounding up' to the nearest block size to know how much disk space is occupied. Simple. done deal.

    But, no.

    Some of the most commonly used file systems today and in the recent past (pretty much everything, including the windows set, so ext3, ext4, ntfs, afs, and so on) have the concept of hard links.

    I can have a directory filed with 5 files, each file being 200 bytes large, with a block size of 1024. And yet, the actual disk space being occupied by all this is 1024 bytes. Not 5120. Why? The 5 files are all hardlinks of each other. How do you want java to deal with a file on disk that is hardlinked 5 times when asking it: "How much space does this file actually take on disk"? Should all but 1 file report '0 actual size'? Should they all report 1024? Should each one report 205 (thats 1024 divided by 5 rounded to the nearest integer)? Should each one report 204.8 (i.e. the API of this thing returns double, not long?)

    Modern file systems such as AFS and btrfs go one step further. They have a snapshotting deduplication facility. This is not the same thing as a hardlink. A snapshotted link means the 2 files are considered separate, and changing one does not also change the other (unlike with hardlinks, where changing one changes them all!). Nevertheless, the file system stores them as snapshot hardlinks, meaning, as long as nobody touches any of the files, only a single copy is stored. You can change these files if you want to and they are independent (changing one does not change the other); but if you do that, effectively the file you are changing is 'copied' first.

    How should java, or du, or any other file size reporting tool, report the size of a snapshot-linked file?

    The basic aim, surely, is to go: "I shall ask the system to give me the actual size on disk for all files on disk, add up the numbers, compare that to the reported amount of total disk space a drive can hold, and then I know exactly how many more bytes I can store".

    These problems also exist in reverse. Modern file systems can make state snapshots. If there's a state snapshot and you fully delete a file, it reclaims zero disk space because a different state still refers to it. Not a single file on the entire file system can get you the bytes anymore, you'd have to run an instruction to switch to a different state first. But still, no reclaiming of bytes, not yet, you'd have to ditch the other state snapshot first. Hence:

    That aim? Effectively impossible on modern file systems. du? Nope, du cannot do any of this. du is far too 'stupid' to try to figure all this out.

    Java, wisely, hasn't waded in. The concept of 'file size' is clear enough: If nobody changes this file and I open it and start reading from it, how many bytes will I read before the stream of bytes ends? That has a clear, unambiguous answer that has none of these complications. Java can give you that, with Files.size().