I'm importing 28k documents into a MongoDB using the DoctrineMongoDBBundle (Symfony 2.7.4) by foreach-looping threw the source collection. Although it's working as expected, I was wondering how to optimize the performance. I discovered that importing the first 1000 only takes a blink of an eye, but the import process slows down with every flush. Does it make sense to split the source collection and import e.g. 100 at a time? How often would you flush?
Are there any best practices?
Thanks for your suggestions!
It's all depend on available memory and documents size. You can check the size of unit of work
by $dm->getUnitOfWork()->size();
. I think you don't detaching documents after flush, that's why everything slow down. Use clear()
after flush()
to detach documents from doctrine.
For example, following should persist 100 documents, flush all of them in one operation, detach from doctrine, and repeat this operation for all $documents
:
$batchSize = 100;
$i = 1;
foreach ($documents as $document) {
$dm->persist($document);
if (($i % $batchSize) === 0) {
$dm->flush();
$dm->clear();
}
$i++;
}
$dm->flush();
$dm->clear();