Search code examples
pythonpython-3.xmongodbpymongopymongo-3.x

Pymongo Data Aggregation group by date and count of distinct column values for each date group


I have a table in mongodb database, which I am trying to aggregate using python. Table is in the following format-

[{'date': '01-01-2022', 'orderid': 1001},
 {'date': '01-01-2022', 'orderid': 1001},
 {'date': '01-01-2022', 'orderid': 1002},
 {'date': '02-01-2022', 'orderid': 1003},
 {'date': '02-01-2022', 'orderid': 1003},
 {'date': '02-01-2022', 'orderid': 1003},
 {'date': '02-01-2022', 'orderid': 1004},
 {'date': '02-01-2022', 'orderid': 1005},
 {'date': '03-01-2022', 'orderid': 1006},
 {'date': '03-01-2022', 'orderid': 1007}]

I want to group the data on the basis of date column and give count of unique orderid for each date group. Expected Output-

[{'date': '01-01-2022', 'count_orderid': 2},
 {'date': '02-01-2022', 'count_orderid': 3},
 {'date': '03-01-2022', 'count_orderid': 2}]

I have tried the following code, and various combinations but it gives error.

agg_result= collection.aggregate(
    [{
    "$group" : 
        {"_id" : "$date"
         }}.distinct('orderid').length
    ])

Solution

  • There currently isn't a "$group" accumulator operator that will do what you want. You could write a custom "$accumulator" in javascript, but there's an easier way by using a "$group" stage followed by a "$project" stage.

    db.collection.aggregate([
      {
        "$group": {
          "_id": "$date",
          // make array of all unique orderid in group
          "orderIds": {"$addToSet": "$orderid"}
        }
      },
      {
        "$project": {
          "_id": 0,
          "date": "$_id",
          // number of unique orderid in group is the size of the array
          "count_orderid": {"$size": "$orderIds"}
        }
      }
    ])
    

    Try it on mongoplayground.net.