Search code examples
c++cfileoptical-drive

What approach works best for quickly reading files off of optical drives?


When reading files off of a hard drive, mmap is generally regarded as a good way to quickly get data into memory. When working with optical drives, accesses take more time and you have a higher latency to worry about. What approach/abstraction do you use to hide/eliminate as much latency and/or overall load time of the optical drive as possible?


Solution

  • There's no real abstraction you can employ. Optical drives have very specific characteristics that must be optimized for to get the best performance.

    Some tips:

    The biggest killer on optical drives is seek time. Where possible make sure all the files you are reading are sequential on disc and as closely packed as possible. If you must seek then seek in one direction and as infrequently as possible.

    Asynchronous reading can also massively improve performance. If you need to load and process files A,B & C then before processing A you should start reading file B, and while processing B you should be reading file C and so on.

    Generally the more data you can read in one go the better, e.g avoid lots of little reads(). You will only get the theoretical throughput of a disc while reading large amounts of data. Some OS's /drivers will minimize the penalty of reading lots of little files by caching sectors, some will not.

    Doing lots of exists(filename) checking can also be detrimental on some filesystems / OSs where only parts of the TOC are cached.

    In our applications we usually pack files into one or more "lumped" files and have them ordered sequentially based on their access order. Some files (and directories) are compressed and read in their entirety before being decompressed in memory. This can be a win if you have a directory that contains a multitude of small files (e.g XML or scripts).

    Basically lots of benchmarking and tweaking :)