Search code examples
phpmongodbmongo-collection

MongoDB: "find()" with foreign-key to different collection


Sorry if this might be a typical "RTM"-question, I am new to MongoDB and did some manual-reading but sadly I didn't find an attempt to solve that.

I have two collections, one collection is "articles" containing an array of "categories" which has one or more MongoID objects with IDs of my categories-collection.

I would like to display all categories with the number of articles refering to the category. Below my solution I found atfer some time of researching:

my collection of categories:

Array
(
    [_id] => MongoId Object
        (
            [$id] => 54eb1510974f5590179702aa
        )

    [name] => Test
    [multiplier] => 2    
)

My collection of articles:

Array
(
    [_id] => MongoId Object
        (
            [$id] => 54e5e39f974f5535248b4bdf
        )

    [productnumber] => 63483

    [categories] => Array
        (
            //... other categories...
            [1] => MongoId Object
                (
                    [$id] => 54eb1510974f5590179702aa
                )

        )
    [image] => /var/www/mongodbtest/Files/FTP/images/63483.jpg
)

My current PHP code:

foreach($oAllCategories as $oCategory)
{            
    $iArticleCount = $oArticles->find(array('categories' => $oCategory['_id']))->count();   
    // Debug
    echo $oCategory['name'].' = '.$iArticleCount.' <br />';
}  

Now the problem is, that with 70'000 articles and 2'200 categories this is slow and takes a lot of time. Also I can't sort my categories by the number of articles without iterating through all articles.

Is there a better way to do this?


Solution

  • I'm not familiar with PHP, so I'll use mongo shell syntax. You can use an aggregation pipeline to compute this server-side in one go:

    db.articles.aggregate([
        { "$unwind" : "$categories" },
        { "$group" : { "_id" : "$categories", "count" : { "$sum" : 1 } } }
    ])
    

    The $unwind stage "unwinds" each article document along its categories array, e.g.

    { "x" : 1, "categories" : ["a", "b", "c"] }
    ===>
    { "x" : 1, "categories" : "a" },
    { "x" : 1, "categories" : "b" },
    { "x" : 1, "categories" : "c" }
    

    Then the $group stage merges all the documents along the values of categories and counts the number of elements in the group. The result looks like

    { "_id" : "c", "count" : 1 }
    { "_id" : "b", "count" : 1 }
    { "_id" : "a", "count" : 1 }
    

    Your _id's would be category _id's, which you could join with the categories collection to turn into names. I think you should just store the category name along with the _id on the article, though. How often does a category name actually change?

    Generally, you should avoid doing operations like this, though, because the aggregation is scanning every article, expanding it into multiple documents, the processing every one into its corresponding group. It's better to incrementally maintain this information in another collection For example, you could increment a count in each category document every time an article in that category is inserted.