Search code examples
pythonmongodbaggregation-frameworkpymongo

Count with MongoDB aggregate $group result


I have a database query with pymongo , like so

pipeline = [
    {"$group": {"_id": "$product", "count": {"$sum": 1}}},
]
rows = list(collection_name.aggregate(pipeline))
print(rows)

The result like to

[
    {'_id': 'p1', 'count': 45},
    {'_id': 'p2', 'count': 4},
    {'_id': 'p3', 'count': 96},
    {'_id': 'p1', 'count': 23},
    {'_id': 'p4', 'count': 10}
]

Objective:

On the basis of the above results, I want to conduct statistics between partitions. For example, To get the number of count times in the following intervals:

partition, count
(0, 10], 2
[11, 50), 2
[50, 100], 1

Is there a way of doing this entirely using MongoDB aggregate framework?

Any comments would be very helpful. Thanks.


Answer by @Wernfried Domscheit

$bucket

pipeline = [
    {"$group": {"_id": "$product", "count": {"$sum": 1}}},
    {"$bucket": {
        "groupBy": "$count",
        "boundaries": [0, 11, 51, 100],
        "default": "Other",
        "output": {
            "count": {"$sum": 1},
        }
    }}
]
rows = list(tbl_athletes.aggregate(pipeline))
rows

$bucketAuto

pipeline = [
    {"$group": {"_id": "$product", "count": {"$sum": 1}}},
    {"$bucketAuto": {
        "groupBy": "$count",
        "buckets": 5,
        "output": {
            "count": {"$sum": 1},
        }
    }}
]
rows = list(tbl_athletes.aggregate(pipeline))
rows

NOTICE:

In $bucketdefault must be there.


Solution

  • Yes, you have the $bucket operator for that:

    db.collection.aggregate([
       {
          $bucket: {
             groupBy: "$count",
             boundaries: [0, 11, 51, 100],
             output: {
                count: { $sum: 1 },
             }
          }
       }
    ])
    

    Or use $bucketAuto where the intervals are generated automatically.