Search code examples
javamongodbgroovybatch-processinggmongo

Groovy gmongo batch processing


I'm currently trying to run a batch processing job in groovy with Gmongo driver, the collection is about 8 gigs my problem is that my script tries to load everything in-memory, ideally I'd like to be able to process this in batch similar to what Spring Boot Batch does but in groovy scripts

I've tried batchSize(), but this function still retrieves the entire collection into memory only to apply it to my logic in batch-process.

here's my example

momngoDb.collection.find().collect() it -> {
  //logic
}

Solution

  • After deliberation I found this solution to works best for the following reasons.

    1. Unlike the Cursor it doesn't retrieve documents on a singular basis for processing (which can be terribly slow)
    2. Unlike the Gmongo batch funstion, it also doesn't try to upload the the entire collection in memory only to cut it up in batches for process, this tends to be heavy on machine resources.

    code below is efficient and light on resource depending on your batch size.

    def skipSize = 0
    def limitSize = Integer.valueOf(1000) batchSize (if your going to hard code the batch size then you dont need the int convertion)
    def dbSize = Db.collectionName.count()
    
    def dbRunCount = (dbSize / limitSize).round()
    
    dbRunCount.times { it ->
        dstvoDsEpgDb.schedule.find()
                .skip(skipSize)
                .limit(limitSize)
                .collect { event ->
                //run your business logic processing
                }
    
        //calculate the next skipSize   
        skipSize += limitSize
    
    }