Search code examples
performancehard-driveheuristics

Optimizing locations of on-disk data for sequential access


I need to store large amounts of data on-disk in approximately 1k blocks. I will be accessing these objects in a way that is hard to predict, but where patterns probably exist.

Is there an algorithm or heuristic I can use that will rearrange the objects on disk based on my access patterns to try to maximize sequential access, and thus minimize disk seek time?


Solution

  • Depending on what you mean by "hard to predict", I can think of a few options:

    If you always seek based on the same block field/property, store the records on disk sorted by that field. This lets you use binary search for O(log n) efficiency.

    If you seek on different block fields, consider storing an external index for each field. A b-tree gives you O(log n) efficiency. When you seek, grab the appropriate index, search it for your block's data file address and jump to it.

    Better yet, if your blocks are homogeneous, consider breaking them down into database records. A database gives you optimized storage, indexing, and the ability to perform advanced queries for free.