I want to build inverted list in my MongoDB collection. Collections looks like this:
{ "word" : 2, "docToPos" : { "1" : [ 0 ] } }
{ "word" : 5, "docToPos" : { "1" : [ 1 ] } }
{ "word" : 1, "docToPos" : { "1" : [ 2 ], "2" : [ 1 ] } }
{ "word" : 9, "docToPos" : { "2" : [ 2, 43, 1246 ] } }
word
is some id from dictionary and docToPos
is map document
to position
- eg word 2 is in document 1 at position 1, and word 9 is in document 2 at positions 2, 43 and 1246.
Every new document which I want to add to database is simply an array with word id's:
[23, 43, 75, 18, ... ]
So using spring-mongo I have this java-code:
for (int i=0; i < array.length; i++) {
invertedListDao.upsert(array[i], documentId, i);
}
(upsert method is implemented by me)
This solution works, but if document has 100 000 words it takes 100 000 queries to mongo.
So finally, my question is: is thera a way to do this faster? Eg: query the whole array at once and execute this in db? I know there is eval
function in mongo
, but there isn't one in mongo-spring
One way to improve performance would be to use bulk upserts.
var bulk = db.invertedListDao.initializeUnorderedBulkOp();
for (var i=0; i < array.length; i++){
bulk.find({...}).upsert().replaceOne({...})
}
bulk.execute();
The reason why it is more efficient and what kind of speed boost you can expect are outlined in my answer here but basically you will be doing only 1 call to mongo no matter how many words you have.
I am not familiar with java spring mongo, but my rudimentary search suggests that it is supported and I hope you will be able to find how to implement bulk upserts in your java driver.
P.S. And with the help of Bartektartanus, here is the link to official documentation.