Search code examples
cdatabase-designoperating-systemfilesystemssystems-programming

contiguously space on hard disk - NTFS


My question is about file allocation methods on NTFS Fs.

I have two main questions -

  1. When i create a file on NTFS, is it stored contiguously on the physical hard disk?
  2. if not - is there a way to create a file such that when i write to it the data is stored contiguously (on the hard disk) in it? Something like extents in database.
  3. if such a file exists - is there any way to read data from it (using C read system call) in bunch/block. what is the maximum bunch size I can use.

I am trying to make a simple file based DB for small applications and would like to make my db in the file. for performance reason i need to keep my data in contiguous order on the disk and read it in bunches. (I plan to mmap this file in my application).


Solution

  • OK, so let's answer point by point...

    Question 1: When i create a file on NTFS, is it stored contiguously on the physical hard disk?

    The question makes no sense. When you create a file, NTFS will allocate the space in the MFT for the metadata it needs to track things. Small files may actually fit inside the MFT record for the file - such resident files are, by definition, contiguous. If a file won't fit inside the MFT, then blocks of space are allocated as necessary and they may or may not be contiguous. Generally speaking, it doesn't know how big your file will be, or how much space to preallocate for it - so NTFS will just allocate space as necessary, although you can give it a hint by calling the SetEndOfFile function. But that provides only a hint and no guarantee that the file data will be stored in a contiguous area of the disk. In fact, it should be trivial to convince yourself that even if the filesystem performs real-time defragmentation, it can never *guarantee that the free space will be available as a single, contiguous block of disk addresses.


    Question 2: if not - is there a way to create a file such that when i write to it the data is stored contiguously (on the hard disk) in it? Something like extents in database.

    Why do you think this is an important concern? You generally shouldn't care how the filesystem stores your data; you should only care about the fact that it does store the data. You may think that accessing a file that's not stored continuously would be slower, but that may not, necessarily, be the case; advanced caching algorithms and prefetching by the O/S will often eliminate any slowdown completely. If your concern is performance, then do you have actual hard data that shows that fragmentation by the filesystem is an issue? If so, the correct approach is to either use a different filesystem or no filesystem at all.


    Question 3: if such a file exists - is there any way to read data from it (using C read system call) in bunch/block. what is the maximum bunch size I can use.

    The C system calls (like fread) don't know about NTFS, fragmentation, "bunches" and blocks. All they know is how to read the requested number of bytes from the specified file handle and put the data into a buffer that you supply. You can specify any size that you want, really, although the C library will call O/S and filesystem specific APIs to read data in multiples of the block size, which is implementation defined.