Search code examples
databasemongodbnosqlaggregation-frameworkaggregate

How to get the correct percentage from array of objects with aggregation pipeline


I have documents with an array of objects, and I need to do the following:

  • Get the total points of member arrays. On the example, the 2 documents have a total of 5500 points.
  • Next, get each document's members total points (for Josh it's 2000, for Carl it's 3500) and then compute what percent it is against the total points. Formula is: (members' points / total points) * 100 or (2000 / 5500) * 100 for Josh.
  • Finally, return the fields _id, leader, members' points, and percentage fields.

It should look something like below:

{
  _id: '00001',
  leader: "Josh",
  memberpoints: 2000,
  percentage: 36.3636
},
{
  _id: '00002',
  leader: "Carl",
  memberpoints: 3500,
  percentage: 63.6363
}

These are the documents:

[{
  "_id": { "$oid": "00001" },
  "leader": "Josh",
  "members": [
    {
      "name": "Person A",
      "points": 500
    },
    {
      "name": "Person B",
      "points": 500
    },
    {
      "name": "Person C",
      "points": 1000
    }]
  },
  {
  "_id": { "$oid": "00002" },
  "leader": "Carl",
  "members": [
    {
      "name": "Person D",
      "points": 1000
    },
    {
      "name": "Person E",
      "points": 1000
    },
    {
      "name": "Person F",
      "points": 1500
    }]
}]

But, I'm getting the wrong percentage because it seems like it's only getting a single object from the members array. It might have something to do with the $set part of my aggregation pipeline, specifically [{ $divide: ["$members.points", "$total"] }, 100], but I'm not sure.

Wrong percentage, should be 36.3636

{
  _id: '00001',
  leader: "Josh",
  memberpoints: 2000,
  percentage: 18.1818
}

Here is my MongoDB aggregation pipeline:

db.users.aggregate([
{$unwind: "$members"}, 
{$setWindowFields: { output: { total: { $sum: "$members.points" }}}},
{$set: { percentage: { $multiply: [{ $divide: ["$members.points", "$total"] }, 100] }}},
{$group: { _id: "$_id", leader: {$first: "$leader"}, memberpoints: {$sum: "$members.points"}, percentage: {$first: "$percentage"}}}])

Solution

  • Your approach is almost correct; just the order of $set and $group are reversed. Here is a pipeline that produces the desired output with some comments:

    db.foo.aggregate([
        {$unwind: "$members"},
    
        // Great way to get the sum of everything in one pass without having
        // $group get in the way:
        {$setWindowFields: { output: { total: { $sum: "$members.points" }}}},
    
        // Basically home free.  Now re-group, summing the points.  The trick
        // here is to NOT lose the total amount; we will need it after we $sum.
        // We will call it 'percentage' here but only as a placeholder; we
        // will overwrite it with the REAL percentage in the next stage:
        {$group: {_id: "$_id",
                  leader: {$first: "$leader"},
                  memberpoints: {$sum: "$members.points"},
                  percentage: {$first: "$total"}  // not really pct yet.
                 }},
    
        // Now turn it into REAL pct.  The overwrite trick allows us to
        // to not have to unset a "temporary" total value.
        {$addFields: {
            percentage: {$multiply: [100, {$divide: ['$memberpoints', '$percentage']}]}
        }}
    )]
    

    Here is a fancy version that avoids the $unwind and re-$group by letting $reduce sum the memberpoints. Eliminating $unwind/$group stages is not to be underestimated. With a sample set of 1,000,000 docs, the solution above runs in 13128ms (avg); the more compact version below runs in just 6020ms -- TWICE as fast:

    db.foo.aggregate([
    
        {$setWindowFields: {
            output: {
              total: { $sum: {$reduce: {
                             input: "$members",
                             initialValue: 0,
                             in: {$add: [ "$$value", "$$this.points"]}
                         }}
                     }
              }
        }}
    
        ,{$project: {
            _id: true,
            leader: true,
            memberpoints: {$sum:'$members.points'},
            percentage: {$multiply:[100,{$divide: [{$sum:'$members.points'},'$total']}]}
        }}
    
    ]);