Search code examples
mysqldatabasefilecursorfwrite

Variable-length file writing in databases


For learning/experimental purposes I'm trying to mimic some techniques I learned from studying databases. And I'm curious as to how MySQL (and maybe other databases) solve this particular problem.

So I'm writing an application that, like other databases, stores records side-by-side in a single file. I use another file for indexing the positions of the records to quickly look them up. And everything works fine until I need to update a row which is longer than the current version of itself. I have some ideas, but none seem too performance-friendly.

Let's say I want to update record 200 out of 1,000 records. In my logic I place the file cursor where the row begins, and write the data. Let's say the current version of the row is 100 bytes long (and from the 101th byte the next record begins). The new data is 150 bytes long, so just writing with the file cursor would effectively overwrite bytes from the next record.

To my knowledge you cannot "push" data forward in a file from the cursor - and if I could it doesn't seem like the most performance-friendly operation.

I would have the option the append the new data and replace the current row with NULL bytes. But that seems to be a) a waste of space b) again, requiring a lot of machine work to rebuild the file without the NULL bytes

And then there's the option of defragmentation, but I'm not ready to go that direction yet.

Does anybody know how other databases deal with this?


Solution

  • Other databases handle this in several ways. I can answer for MySQL.

    • The first time you write a record into some space in the file, leave a bit of extra room. Organize the storage into "pages" of 16KB, which several records fit in. But leave 1/16 space empty initially, to allow for rows to expand. Each page is loaded into RAM on demand, and the records in it may be reorganized a bit before the page is written back to disk.

    • If the records grow beyond the space in a page, then they may be split up. Some records may be relocated to other new pages, which might be quite far apart. The indexing that keeps track of locations of records does not require the records to be adjacent.

    • The empty space left from all the record reorganizing and splitting results in some fragmentation, but it might be a small percentage of the storage overall, so we don't worry about it. Eventually, the fragmentation may get worse, so it's a good idea from time to time to make a fresh copy of all the records into a new set of pages, reorganized more efficiently, to replace the original. How often you should do this depends on how much activity you've done in your database, so there's no strict rule for it.

    • Another relatively recent new feature that might help is called sparse files, or hole-punching. Traditionally, all the contiguous bytes of a file occupy space on disk, whether you are storing useful data in those bytes or not. But what if the gaps within the file could be treated as free disk space? Then you wouldn't care about the fragmentation. This is not supported by all filesystems, and the "holes" are generally limited to multiples of the filesystem block size (e.g. 4KB).

      MySQL 5.7 makes use of hole-punching in its page compression feature. MySQL still stores data in 16KB pages, but you can enable an optional compression of the data within a page. If the compression leaves a gap of at 4KB (the size of a filesystem block), it treats that as a hole, and frees the filesystem storage for it.

    There are many other tricks possible. It's not worth trying to make storage optimized down to the byte, because as soon as you do, another data update will require you undo it. It's usually better to optimize for quick updates than for perfectly compact storage. Everything comes down to a tradeoff between different types of efficiency (for example, speed vs. storage), and you have to make some decisions about what's more important for your database.