Search code examples
data-structuresfilesystemshardwarehard-drivedrives

How disk space is allocated for an edited file


Assume I save a text file in the HDD disk storage(assume the disk storage is new and so defragmented) and the file name is A with a file size of say 10MB

I presume, the file A occupies some space in the disk as shown, where x is an unoccupied space/memory on the disk

AAAAAAAAAAAAAxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Now, I create and save another file B of some size. So B will be saved as

AAAAAAAAAAAAABBBBBBBBBBBBBBBBBxxxxxxxxxxxxxxxxxxxxxxxxxxx - as the disk is defragmented, I assume the storage will be contiguous.

Here, what if I edit the file A and reduce the file size to 2MB. Can you say how the memory will be allocated now.

Some options I could think of are
AAAAAAxxxxxxxxxBBBBBBBBBBBBBBBBxxxxxxxxxxxxxxxxxxxxxxxxxxxx

or
AAxxxAAxxxAxAxxBBBBBBBBBBBBBBBBxxxxxxxxxxxxxxxxxxxxxxxxxxxx

or a totally new location freeing up the bigger chunk for other files.
xxxxxxxxxxxxxxxBBBBBBBBBBBBBBBBAAAAAAxxxxxxxxxxxxxxxxxxxxxx

or is it any other way based on any algorithm or data-structure.


Solution

  • A lot of this would depend upon what type of filesystem you are using (and also how the OS interacts with it). The behavior of an NTFS filesystem in Windows may be nothing like the behavior of an ext3 filesystem in Ubuntu for the same set of logical operations.

    Generally speaking, however, most modern filesystems define a file as a series of pointers to blocks on the disk. There is a minimum block size that describes the smallest allocatable block (typically ranging from 512 bytes to 4 KBytes), so files that are less than this size or not some exact multiple of this size will have some amount of extra space allocated to them.

    So what happens when you allocate a 10 MB file 'A'? The filesystem reserves 10MB worth of blocks (perhaps even allowing for a few extra blocks at the end to accommodate any minor edits that are made to the file or its metadata) for the file contents. Ideally these blocks will be contiguous, as in your example. When you edit 'A' and make it smaller, the filesystem will release some or all (most likely all since in most cases editing 'A' involves writing out the entire contents of 'A' to disk again, so there's little reason for the filesystem to prefer keeping 'A' in the same physical location over writing the data to a new location somewhere else on the disk) of the blocks allocated to 'A', and update its reference to include any new blocks that were allocated, if necessary.

    With that said, in the typical case and using a modern filesystem and OS, I would expect your example to produce the following final state on disk ('b' and 'a' represent extra bytes allocated to 'B' and 'A' that do not contain any meaningful data):

    xxxxxxxxxxxxxxxBBBBBBBBBBBBBBBBbbAAAAAAaaxxxxxxxxxxxxxxxxxxxxxx

    But real-world results will of course vary by filesystem, OS, and potentially other factors (for instance, when using an SSD data fragmentation becomes irrelevant because any section of the disk can be accessed at very low latency and with no seek penalty but at the same time it becomes important to minimize write cycles so that the device doesn't wear-our prematurely, so the OS may favor leaving 'A' in place as much as possible in that case in order to minimize the number of sectors that need to be overwritten).

    So the short answer is, "it depends".