Search code examples
mongodbmongodb-queryaggregation-frameworknosql-aggregation

Creating objects from array of objects in a grouped mongo aggregation


I have been writing an aggregation pipeline to show a summarized version of data from a collection.

Sample Structure of Document:

{
    _id: 'abcxyz',
    eventCode: 'EVENTCODE01',
    eventName: 'SOMEEVENT',
    units: 1,
    rate: 2,
    cost: 2,
    distribution: [
        {
            startDate: 2021-05-31T04:00:00.000+00:00
            units: 1
        }
    ]
}

I have grouped it and merged the distribution into a single list with $unwind step before $group:

[
    $unwind: {
        path: '$distribution',
        preserveNullAndEmptyArrays: false
    },
    $group: {
        _id: {
            eventName: '$eventName',
            eventCode: '$eventCode'
        },
        totalUnits: {
            $sum: '$units'
        },
        distributionList: {
            $push: '$distribution'
        },
        perUnitRate: {
            $avg: '$rate'
        },
        perUnitCost: {
            $avg: '$cost'
        }
    }
]

Sample Output:

{
    _id: {
        eventName: 'EVENTNAME101'
        eventCode: 'QQQ'
    },
    totalUnits: 7,
    perUnitRate: 2,
    perUnitCost: 2,
    distributionList: [
        {
            startDate: 2021-05-31T04:00:00.000+00:00,
            units: 1
        },
        {
            startDate: 2021-05-31T04:00:00.000+00:00,
            units: 1
        },
        {
            startDate: 2021-06-07T04:00:00.000+00:00,
            units: 1
        }
    ]
}

I'm getting stuck at the next step; I want to consolidate the distributionList into a new List with no repeating startDate.

Example: Since first 2 objects of distributionList have the same startDate, it should be a single object in output with sum of units:

Expected:

{
    _id: {
        eventName: 'EVENTNAME101'
        eventCode: 'QQQ'
    },
    totalUnits: 7,
    perUnitRate: 2,
    perUnitCost: 2,
    newDistributionList: [
        {
            startDate: 2021-05-31T04:00:00.000+00:00,
            units: 2 //units summed for first 2 objects
        },
        {
            startDate: 2021-06-07T04:00:00.000+00:00,
            units: 1
        }
    ]
}

I couldn't use $unwind or $bucket as I intend to keep the grouping I did in previous steps ($group).

Can I get suggestions or a different approach if this doesn't seem accurate?


Solution

  • You may want to do the first $group at eventName, eventCode, distribution.startDate level. Then, you can $group again at eventName, eventCode level and using $first to keep your original $group fields.

    db.collection.aggregate([
      {
        $unwind: {
          path: "$distribution",
          preserveNullAndEmptyArrays: false
        }
      },
      {
        $group: {
          _id: {
            eventName: "$eventName",
            eventCode: "$eventCode",
            distributionStartDate: "$distribution.startDate"
          },
          totalUnits: {
            $sum: "$units"
          },
          distributionUnitsSum: {
            $sum: "$distribution.units"
          },
          perUnitRate: {
            $avg: "$rate"
          },
          perUnitCost: {
            $avg: "$cost"
          }
        }
      },
      {
        $group: {
          _id: {
            eventName: "$_id.eventName",
            eventCode: "$_id.eventCode"
          },
          totalUnits: {
            $first: "$totalUnits"
          },
          newDistributionList: {
            $push: {
              startDate: "$_id.distributionStartDate",
              units: "$distributionUnitsSum"
            }
          },
          perUnitRate: {
            $first: "$perUnitRate"
          },
          perUnitCost: {
            $first: "$perUnitCost"
          }
        }
      }
    ])
    

    Here is the Mongo Playground to show the idea for your reference.