Search code examples
javamongodbupsertspring-mongo

Faster way to build inverted list in Mongo


I want to build inverted list in my MongoDB collection. Collections looks like this:

{ "word" : 2, "docToPos" : { "1" : [ 0 ] } }
{ "word" : 5, "docToPos" : { "1" : [ 1 ] } }
{ "word" : 1, "docToPos" : { "1" : [ 2 ], "2" : [ 1 ] } }
{ "word" : 9, "docToPos" : { "2" : [ 2, 43, 1246 ] } }

word is some id from dictionary and docToPos is map document to position - eg word 2 is in document 1 at position 1, and word 9 is in document 2 at positions 2, 43 and 1246.

Every new document which I want to add to database is simply an array with word id's:

[23, 43, 75, 18, ... ]

So using spring-mongo I have this java-code:

for (int i=0; i < array.length; i++) {
  invertedListDao.upsert(array[i], documentId, i);
}

(upsert method is implemented by me)

This solution works, but if document has 100 000 words it takes 100 000 queries to mongo.

So finally, my question is: is thera a way to do this faster? Eg: query the whole array at once and execute this in db? I know there is eval function in mongo, but there isn't one in mongo-spring


Solution

  • One way to improve performance would be to use bulk upserts.

    var bulk = db.invertedListDao.initializeUnorderedBulkOp();
    for (var i=0; i < array.length; i++){
      bulk.find({...}).upsert().replaceOne({...})
    }
    bulk.execute();
    

    The reason why it is more efficient and what kind of speed boost you can expect are outlined in my answer here but basically you will be doing only 1 call to mongo no matter how many words you have.

    I am not familiar with java spring mongo, but my rudimentary search suggests that it is supported and I hope you will be able to find how to implement bulk upserts in your java driver.

    P.S. And with the help of Bartektartanus, here is the link to official documentation.