There is a dataset of ~5GB in size. This big dataset just has a key-value pair per line. Now this needs to be read for the value of keys some billion times.
I have already tried disk based approach of MapDB, but it throws ConcurrentModification Exception
and isn't mature enough to be used in production environment yet.
I also don't want to have it in a DB and make the call billion times (Though, certain level of in-memory caching can be done here).
Basically, I need to access these key-value dataset in mapper/reducer of a hadoop's job step.
So after trying out a bunch of things we are now using SQLite.
Following is what we did: