Search code examples
javamapdb

How to sort items for faster insertion in the MapDB BTree?


so I have a list of around 20 million key value pairs, and I'm storing the data in several MapDB's differently to see how it affects my programs performance, and for experiment sake.

The thing is, it takes quite a lot of time to insert (in random order) 20 million key-value pairs into a mapdb. So, I would like to sort the list of key-value pairs I have so I can insert them faster, and thus build databases faster out of them.

So, how would I go about this?

I'd like to learn how to do this for MapDB's BTreeSet and BTreeMap, or, MapDBs that use single key-value pairs and MapDBs that have multiple values for a single key.

EDIT: I forgot to mention, the key-value pairs are String objects.


Solution

  • Use build in Data Pump to create new BTreeMap. It has linear speed with number of records. It will sort data even if they do not fit into memory.

    Map newMap = db.createTreeMap("map")
        .pumpSource(randomIterator)  //source of data to import
        .pumpBatchSize(1000000)      //sort data from source, batch size must be set so it fits into memory
        .make()