Search code examples
mongodbspring-data-jpaspring-data-mongodb

Mongo JPA: Multiple Group Operations


Given the following dataset:

category date amount
fruit 2023-01-01 1
fruit 2023-01-02 2
meat 2023-01-03 3
nuts 2023-01-04 4
nuts 2023-01-05 5
nuts 2023-01-06 6

Is it possible to do a multi-group aggregation in Mongo JPA that yields:

  • One row per category
  • Count of rows in that category
  • Amount on max(date)
category count last_amount
fruit 2 2
meat 1 3
nuts 3 6

This should be possible in a relational database with some SQL grouping and possibly sub-queries but seems incredibly difficult to achieve in Mongo. I could get get the row count in one aggregation but seems like I would need multiple operations to achieve what I need.


Solution

  • This was genuinely surprisingly difficult to achieve. Charchit pointed me in the right direction but to get to the exact response I wanted required so much more effort.

    Explanation of stages required:

    • $group: group the collection by category and obtain the $sum and the $max($date) of each category
    • $lookup: join the collection with itself to obtain the amount at the maxDate
    • the $lookup pipeline is required because a multi-field match needs to be performed ensuring the match is for the maxDate only for that given category
    • the $lookup let is required because $lookup pipeline somehow cannot reference a previous stage field directly. note the double $ required to reference let variables like $$maxDate
    • $project is used because we are only interested in the amount, so the rest of the $lookup document needs to be discarded
    • $replaceRoot + $mergeObjects + $arrayElemAt is used to merge the first entry in the $lookup result up into the root document
    • the final $project removes the (now) unneeded maxDate and latestAmount working fields

    Here's the resulting query that meets the exact specification in the question:

    db.collection.aggregate([
      {
        $group: {
          _id: "$category",
          count: { $sum: 1 },
          maxDate: { $max: "$date" }
        }
      },
      {
        $lookup: {
          from: "collection",
          let: {
            category: "$_id",
            maxDate: "$maxDate"
          },
          pipeline: [
            { $match: {
                $expr: {
                  $and: [
                    { $eq: [ "$category", "$$category" ] },
                    { $eq: [ "$date", "$$maxDate" ] }
                  ]
                }
            }},
            { $project: { _id: 0, amount: 1 } }
          ],
          as: "latestAmount"
        }
      },
      {
        $replaceRoot: {
          newRoot: {
            $mergeObjects: [
              { $arrayElemAt: [ "$latestAmount", 0 ] }, "$$ROOT"
            ]
          }
        }
      },
      { $project: { maxDate: 0, latestAmount: 0 } }
    ])
    

    Playground