Search code examples
symfonyduplicatesentitymanager

Check for duplicates in for loop before entity manager is flushed


I have a data source that I'm using a for loop to process. The data source can sometimes have duplicates. I'm looping over the data source and creating "item" entities. I'm trying to avoid those duplicates but I think that since the items have not been sent to the database they are not found during the duplicate check.

Here is my pseudo for loop:

foreach($datasource['data'] as $post){
    $dupe = $em->getRepository('AppBundle:Item')->findOneByDatasourceId($post['id']);
    if(!$dupe){
        //process the item
        $item = new Item();
        $item->setDatasourceId($post['id']);
        $em->persist($item);
    }
}

$em->flush();

This does find duplicates.

How do I find duplicates when the data has not been sent to the database yet? I was under the impression that the entity manager would have known about the data that has yet to be pushed.

Thanks


Solution

  • The EntityManager::find does not check items waiting to be persisted. The items are stored in a unit of work object and, in theory, you could check it. But it's a bit of a pain. As @Matteo has suggested, You could also flush after each persist but that can impact performance.

    It's easy enough to make you own local cache:

    $datasourceCache = [];
    foreach($datasource['data'] as $post){
        $postId = $post['id'];
        if (!isset($datasourceCache[$postID] (
            $datasourceCache[$postID] = true;
            $dupe = $em->getRepository('AppBundle:Item')->findOneByDatasourceId($postId);
            if(!$dupe){
                //process the item
                $item = new Item();
                $item->setDatasourceId($postId);
                $em->persist($item);
            }
        }
    }
    $em->flush();