Search code examples
javadata-structuresinverted-index

How can I store the inverted document index on a disk?


I know this question has been asked again and again in stackoverflow and google, but I find that all the answers cannot satisfy me. Most of the solutions assume that the whole index can fit in memory, then we can store it to the disk by Java serialization. When the index is needed, we must load whole index to the memory. Solutions like this: solution 1, solution 2. But as we know, this assumption is not always true, so what should I do to store the inverted document index to the disk when it doesn't fit to the memory?

I will appreciate it if you can give me the solution in Java.


Solution

  • I would try JDBM3 This supports tree and hash collections and the only requirement is that each key or entry fit into memory.

    If you have super large entries, I suggest storing each one as files which can be memory mapped to extract portions of the data. In the lookup table you can store keys to file names. (Or make the files names the keys)