Search code examples
mysqlcdatabaseoperating-systeminnodb

How to ask for continuous blocks (non fragmented) when appending or creating files in C?


So i want to understand how DBMS implementation works

To give an example :

MySQL implements each tables with its own pages, which are 16KB

so each table is a file, and is a multiple of 16KB, considering how large is it and therefore how many pages it needs

Now i read somewhere that these pages don't get fragmented in disk image or memory image, so my question is, HOW?

how do DBMS developers tell the operating system that "hey i just added a 16KB data (page) to this file, but make this page doesn't get fragmented"

is it because the memory image doesn't actually show how bytes are really stored on disk and its logical?

or is it because these DBMSes somehow ask the O.S to make sure these chunks of 16KB bytes do not get fragmented?

and how to do this in C?


Solution

  • 50 years ago, your question was a hot topic in Computer Science and Engineering. But not today.

    Virtually every hard drive has an allocation unit of 512 bytes. CDs have an AU of 2KB. Certain SSDs, when tuned for MySQL, have an AU of 16KB.

    There are many different "filesystems". Windows has (at least) FAT-32 and NTFS. *nix has lots. Each FS prides itself with doing a better job at something. But freespace management fights with allocation unit size. Remember the hassle DOS had with it's FAT-16 while disks were getting bigger and bigger? The "16" in the name refers to the disk having up to 2^16 blocks. That forced a 2GB disk drive had an allocation unit of 32KB! The typical system had a lot of little files, literally half the disk was probably wasted!

    I'm talking about "Allocation Units" because that is essentially the only way to prevent the OS from thinking about scattering the blocks around the drive.

    Let's look at your question from a marketing point of view. If fragmentation is such a big deal, then

    1. New, better, filesystems would come along to solve the problem -- though not necessarily in the simplistic way you mentioned.
    2. Operating systems have known about the problem, so they have ways of "trying" to allocate in chunks. But they are always willing to give you little pieces when necessary.
    3. MySQL's InnoDB (circa 2000) went to a lot of effort of allocating in "extents" of 4MB(?) in hopes of getting contiguously allocated disk. But when it failed, nothing would crash.
    4. Software would bypass the problem, such as by using "raw drive access. But notice how that is not in the forefront of "how to optimize your database"? If even available, it is buried in the "Oh, by the way" chapter.

    A few decades ago, there were some OSs that would let you pre-allocate a file that was "contiguous". I have not heard of such recently.

    Enterprise systems solved the issue by using hardware RAID controllers with Battery Backed Write Cache. Not only would the scatter-gather be hidden from the user, but writes became 'instantaneous' because of the crash-safe cache.

    SSDs don't have any seek time (unlike HDDs), so it really does not matter if a block is chopped up. Sure, there is some code to deal with it, but that is really insignificant compared to the transfer, checksum, mutex, system call, etc, etc, times.

    I have a Rule of Thumb: If a potential optimization does not look like it will help by 10%, I drop it and move on to something else. I suggest you move on.