Search code examples
c++filerandom-access

C++ / Fast random access skipping in a large file


I have large files, containing a small number of large datasets. Each dataset contains a name and the dataset size in bytes, allowing to skip it and go to the next dataset.

I want to build an index of dataset names very quickly. An example of file is about 21MB large, and contains 88 datasets. Reading the 88 names quickly by using a std::ifstream and seekg() to skip between datasets takes about 1300ms, which I would like to reduce.

So in fact, I'm reading 88 chunks of about 30 bytes, at given positions in a 21MB file, and it takes 1300ms.

Is there a way to improve this, or is it an OS and filesystem limitation? I'm running the test under Windows 7 64bit.

I know that having a complete index at the beginning of the file would be better, but the file format does not have this, and we can't change it.


Solution

  • You could scan the file and make your own header with the key and the index in a seperate file. Depending on your use case you can do it once at program start and everytime the file changes. Before accessing the big data, a lookup in the smaller file gives you the needed index.