I have a data source that I'm using a for loop to process. The data source can sometimes have duplicates. I'm looping over the data source and creating "item" entities. I'm trying to avoid those duplicates but I think that since the items have not been sent to the database they are not found during the duplicate check.
Here is my pseudo for loop:
foreach($datasource['data'] as $post){
$dupe = $em->getRepository('AppBundle:Item')->findOneByDatasourceId($post['id']);
if(!$dupe){
//process the item
$item = new Item();
$item->setDatasourceId($post['id']);
$em->persist($item);
}
}
$em->flush();
This does find duplicates.
How do I find duplicates when the data has not been sent to the database yet? I was under the impression that the entity manager would have known about the data that has yet to be pushed.
Thanks
The EntityManager::find does not check items waiting to be persisted. The items are stored in a unit of work object and, in theory, you could check it. But it's a bit of a pain. As @Matteo has suggested, You could also flush after each persist but that can impact performance.
It's easy enough to make you own local cache:
$datasourceCache = [];
foreach($datasource['data'] as $post){
$postId = $post['id'];
if (!isset($datasourceCache[$postID] (
$datasourceCache[$postID] = true;
$dupe = $em->getRepository('AppBundle:Item')->findOneByDatasourceId($postId);
if(!$dupe){
//process the item
$item = new Item();
$item->setDatasourceId($postId);
$em->persist($item);
}
}
}
$em->flush();