Search code examples
mongodbaggregation-frameworkaws-documentdb

Mongo db aggregation - $push and $slice top results


I have the following documents in my db:

{uid: 1, score: 10}
{uid: 2, score: 11}
{uid: 3, score: 1}
{uid: 4, score: 6}
{uid: 5, score: 2}
{uid: 6, score: 3}
{uid: 7, score: 8}
{uid: 8, score: 10}

I want to split them into buckets by score - i.e.:

score uids (bucket name in aggregation)
[0,4) 3,5,6 0
[4,7) 4 4
[7,inf 1,2,7,8 7

For this, I created the following aggregation which works just fine:

db.scores.aggregation(
    [
        {
            $bucket: 
                {
                    groupBy: "$score",
                    boundaries: [0, 4, 7],
                    default: 7,
                    output:
                        {
                            "total": {$sum: 1},
                            "top_frustrated": 
                                {
                                    $push: {
                                        "uid": "$uid", "score": "$score"
                                    }
                                },
                        },
                }
        },
    ]
)

However, I would like to return only the top 3 of every bucket - i.e, buckets 0, 4 should be the same, but bucket 7 should have only uids 1,2,8 returned (as uid 7 has the lowest score) - but to include the total count of documents as well, i.e. output of bucket "7" should look like:

{ "total" : 4, "top_scores" : 
    [
        {"uid" : 2, "score" : 11},
        {"uid" : 1, "score" : 10},
        {"uid" : 8, "score" : 10},
    ]
}

I tried using $addFields with $sortArray and $slice, but it either won't work or return errors.

I can of course use $project but I was wondering if there is a more efficient way.

I am using Amazon DocumentDB.


Solution

  • You can use the $topN accumulator, instead of $push, like this:

    db.collection.aggregate([
      {
        "$bucket": {
          "groupBy": "$score",
          "boundaries": [
            0,
            4,
            7
          ],
          "default": 7,
          "output": {
            "total": {
              "$sum": 1
            },
            "top_frustrated": {
              "$topN": {
                "n": 3,
                "sortBy": {
                  "score": -1
                },
                "output": {
                  "uid": "$uid",
                  "score": "$score"
                }
              }
            }
          },  
        }
      },
    ])
    

    Playground link.

    The only catch here is this operator is present in MongoDB 5.2 and above.

    For older versions, this will work:

    db.collection.aggregate([
      {
        "$sort": {
          score: -1
        }
      },
      {
        $bucket: {
          groupBy: "$score",
          boundaries: [
            0,
            4,
            7
          ],
          default: 7,
          output: {
            "total": {
              $sum: 1
            },
            "top_frustrated": {
              $push: {
                "uid": "$uid",
                "score": "$score"
              }
            },
            
          },
          
        }
      },
      {
        "$project": {
          total: 1,
          top_frustrated: {
            "$slice": [
              "$top_frustrated",
              3
            ]
          }
        }
      }
    ])
    

    Playground link.