Search code examples
phpmongodbhhvm

Iterating over a big MongoDB collection without running out of memory


I have a large Mongo collection I want to iterate over so I do something like that:

$cursor = $mongo->my_big_collection->find([]);

foreach ($cursor as $doc)
    do_something();

But I eventually run out of memory. I expected the cursor to free the memory after each document was processed. Why isn't that the case? I tried calling unset($doc) at the end of my loop but that didn't help.

Right now I have to do something like this to get around the issue (processing the documents by batch and calling unset() on the cursor after each batch):

for ($skip = 0; true; $skip += 1000)
{
    $cursor = $mongo->my_big_collection->find()->skip($skip)->limit(1000);

    if (!$cursor->hasNext())
        break;

    foreach ($cursor as $doc)
        do_something();

    unset($cursor);
}

This seems awkward. The whole point of iterators is to not have to do this. Is there a better way?

I'm using hhvm 3.12 with mongofill.

Thank you for your help.


Solution

  • MongoCursor.php

    /**
     * Advances the cursor to the next result
     *
     * @return void - NULL.
     */
    public function next()
    {
        $this->doQuery();
        $this->fetchMoreDocumentsIfNeeded(); // <<< add documents to $this->documents
    
        $this->currKey++;
    }
    
    /**
     * Return the next object to which this cursor points, and advance the
     * cursor
     *
     * @return array - Returns the next object.
     */
    public function getNext()
    {
        $this->next();
    
        return $this->current();
    }
    

    When you iterate through the cursor, it will store in the cursors all the documents $this->documents. Nothing clear this collection of document. You could try to implement an iteration that remove the documents of $this->documents after getting them maybe ?