Search code examples
python-3.xmongodbdistinct-values

Find distinct values group by another field mongodb


I have collection with documents like this :

{
    "_id" : ObjectId("5c0685fd6afbd73b80f45338"),
    "page_id" : "1234",
    "category_list" : [  
        "football", 
        "sport"
    ],
    "time_broadcast" : "09:13"
}

{
    "_id" : ObjectId("5c0685fd6afbd7355f45338"),
    "page_id" : "1234",
    "category_list" : [ 
        "sport",
        "handball"
    ],
    "time_broadcast" : "09:13"
}

{
    "_id" : ObjectId("5c0694ec6afbd74af41ea4af"),
    "page_id" : "123456",
    "category_list" : [ 
        "news", 
        "updates"
     ],
     "time_broadcast" : "09:13"
}

....

now = datetime.datetime.now().time().strftime("%H:%M")

What i want is : when "time_broadcast" is equal to "now",i get list of distinct "category_list" of each "page_id".

Here is how the output should look like :

{
   { 
     "page_id" : "1234",
     "category_list" : ["football", "sport", "handball"] 
   },

   { 
     "page_id" : "123456",
     "category_list" : ["news", "updates"] 
   }
}

I have tried like this :

category_list = db.users.find({'time_broadcast': now}).distinct("category_list")

but this gives me as output list of distinct values but

of all "page_id" :

 ["football", "sport", "handball","news", "updates"] 

not category_list by page_id .

Any help please ?

Thanks


Solution

  • you need to write an aggregate pipeline

    • $match - filter the documents by criteria
    • $group - group the documents by key field
    • $addToSet - aggregate the unique elements
    • $project - project in the required format
    • $reduce - reduce the array of array to array by $concatArrays

    aggregate query

    db.tt.aggregate([
        {$match : {"time_broadcast" : "09:13"}}, 
        {$group : {"_id" : "$page_id", "category_list" : {$addToSet : "$category_list"}}}, 
        {$project : {"_id" : 0, "page_id" : "$_id", "category_list" : {$reduce : {input : "$category_list", initialValue : [], in: { $concatArrays : ["$$value", "$$this"] }}}}}
    ]).pretty()
    

    result

    { "page_id" : "123456", "category_list" : [ "news", "updates" ] }
    {
            "page_id" : "1234",
            "category_list" : [
                    "sport",
                    "handball",
                    "football",
                    "sport"
            ]
    }
    

    you can add $sort by page_id pipeline if required