We need to seed an application with 3 million entities before running performance tests. The 3 million entities should be loaded through the application to simulate 3 years of real data.
We are inserting 1-5000 entities at a time. In the beginning response times are very good. But after a while they decay exponentially.
We use at groovy script to hit a URL to start each round of insertions.
1) Cleanup GORM after each 100 insertions:
def session = sessionFactory.currentSession
session.flush()
session.clear()
DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP.get().clear()
(old Ted Naleid trick: http://naleid.com/blog/2009/10/01/batch-import-performance-with-grails-and-mysql)
2) We use GPars for parallel insertions:
GParsPool.withPool {
(0..<1000).eachParallel {
def entity = new Entity(...)
insertionService.insert(entity)
}
}
I'm really hoping somebody have a good idea on how to fix the problem.
The pausing would make me think it's the JVM doing garbage collection. Have you used a profiler such as VisualVM to see what time is being spent doing garbage collection? Typically this will be the best approach to understanding what is happening with your application within the JVM.
Also, it's far better to load the data directly into the database rather than using your application if you are trying to "seed" the application. Performance wise of course.
(Added as answer per comment)