Search code examples
c++file-iolookupboost-interprocess

File based look-up table


You need an array of 10^10 4-byte integers to be used as a look-up table. Loading it to RAM would take 40GB, which isn't feasible. You never need to write to this array after it has been initialized. You need to read individual integer values from random locations of this array concurrently from multiple threads of a single process. You're guaranteed to be on a 64-bit platform. What is the fastest implementation of this look-up table? Using regular file reading functions or e.g. Boost memory-mapped file?


Solution

  • It sounds like you should do explicit reads.

    Memory mapping gets its speed from bringing in large chunks of pages in at a time (I believe Windows does 256KiB, not sure about other platforms) and allowing you to re-access them without any penalty after the first time.

    If you're just reading integers from random locations, you'll be reading in 256KB for just 4 bytes out of one page, and maybe never even re-access it. Such a waste! Also consider that you've also just paged out a lot of maybe useful data from other apps and the filesystem cache.