Search code examples
mongodbspring-data-mongodbmongodb-java

MongoDB: count the repetitive time of array element with MapReduce


Say for every document of a collection, it has an string array. how could I count the repetitive time of every element of the array in all this collection? Right now I can find all the distinct element, but then Map Reduce function is a little tricky that I haven't fully understood.

Doc A    
{
_id:
name:
actors: ["a", "b", "c"]
}

Doc B     
{
_id:
name:
actors: ["a", "d"]
}

Doc C   
{
_id:
name:
actors: ["a", "c", "f"]
}

I wanne get a statistic result with a:3 b:1 c:2 d:1 f:1.


Solution

  • An alternative route that you could take is the aggregation framework. Considering the above collection as an example

    Populate test collection:

    db.collection.insert([
        { "_id" : 1, "name" : "ABC1", "actors": ["a", "b", "c"] },
        { "_id" : 2, "name" : "ABC2", "actors" : ["a", "d"] },
        { "_id" : 3, "name" : "XYZ1", "actors" : ["a", "c", "f"] }
    ])
    

    Using MongoDB 3.4.4 or newer:

    db.collection.aggregate([
        { "$unwind" : "$actors" },
        { "$group": { "_id": "$actors", "count": { "$sum": 1} } },
        { "$group": {
            "_id": null,
            "counts": {
                "$push": {
                    "k": "$_id",
                    "v": "$count"
                }
            }
        } },
        { "$replaceRoot": {
            "newRoot": { "$arrayToObject": "$counts" }
        } }    
    ])
    

    Output

    {
        a: 3,
        b: 1,
        c: 2,
        d: 1,
        f: 1
    }
    

    Using MongoDB 3.2 and below:

    The following aggregation pipeline operation uses the $unwind stage to output a document for each element in the actors array and the $group stage to group the documents by the value in the actors array then counts the number of documents per each group (which gives the occurrence of the array elements as a group) by way of the $sum operator:

    db.collection.aggregate([
        { "$unwind" : "$actors" },
        { "$group": { "_id": "$actors", "count": { "$sum": 1} } }
    ])
    

    The operation returns the following results which would be a close match to your expectations but won't give you the documents as key/value pair:

    /* 0 */
    {
        "result" : [ 
            {
                "_id" : "f",
                "count" : 1
            }, 
            {
                "_id" : "d",
                "count" : 1
            }, 
            {
                "_id" : "c",
                "count" : 2
            }, 
            {
                "_id" : "b",
                "count" : 1
            }, 
            {
                "_id" : "a",
                "count" : 3
            }
        ],
        "ok" : 1
    }