Search code examples
pythonmongodbaggregatepymongopercentile

Way to calculate the top x% of users from a database


I was wondering if there was an actually good way to aggregate/calculate percentiles based on all data in a database. For example, data like:

{
'name':'John'
'rank':5
},
{
'name':'John'
'rank':4
},
{
'name':'John'
'rank':5
},
{
'name':'James'
'rank':3
},
{
'name':'Froggy'
'rank':5
},

How would I go about finding a way to calculate the percentile that users achieve a specific rarity compared to others. Example: James falls in the top 10% of users who own rank 5.


Solution

  • emphasized textYou can do something like:

    db.collection.aggregate([
      {
        $group: {
          _id: "$name",
          rankFiveCount: {$sum: {$cond: [{$eq: ["$rank", 5]}, 1, 0]}}}
      },
      {
        $setWindowFields: {
          sortBy: {rankFiveCount: -1},
          output: {
            rank: {$rank: {}},
            totalCount: {$count: {}}
          }
        }
      },
      {$match: {_id: wantedUser}},
      {
        $project: {
          name: "$_id",
          _id: 0,
          percentile: {
            $round: {$multiply: [{$divide: ["$rank", "$totalCount"]}, 100]}
          }
        }
      }
    ])
    

    See how it works on the playground example