Search code examples
hibernategrailsgrails-orm

Grails batch read optimization


I need to re-index all of my domain instances from a live legacy database on a regular basis using a custom ElasticSearch implementation.

All of the suggestions I have found so far have all been for optimizing batch write operations.

Many of the tables have hundreds of thousands if not millions of records and the simple solution of domain.list().each { ... } appears to load the entire dataset into memory at once.

Using Hibernate ScrollableResults seems like a good solution but it works at the RecordSet level and doesn't recreate the actual domain objects. I could use the id field to read() the instance but that seems inefficient.

The Searchable plugin seems to perform an efficient reindex operation.

What other options or optimizations are there for batch reading domain instances successfully?


UPDATE

I recently stumbled across a 4 year old thread on the Grails mailing list with an answer from @BurtBeckwith that suggests using a ScrollableResultSet would be inefficient. He recommends paging through the records using standard GORM methods and clearing the session in batches.


Solution

  • 1) Using a tradition Java for loop will save you a little bit of overhead.

    2) Chunking the data might help. You can use max: and offset: to get chunks of 100 or 1000 that way you're always dealing with a smaller set: http://grails.org/doc/latest/ref/Domain%20Classes/list.html There might be issues with this though if you're modifying the objects in that table or if new data gets added in the middle of a workflow. (I'll leave that for you to test :)

    3) Using SQL or HSQL to get a hold of all of the ids then using Grails .load() might help as well. http://grails.org/doc/latest/ref/Domain%20Classes/load.html

    4) I've tried to implement some large batch jobs using Grails/Hibernate before and found that it just really didn't work well. I'd advise you to work directly with the database using SQL if at all possible. It will by far be the quickest.