Search code examples
mongodbmongodb-queryaggregation-frameworkpymongoaggregate-functions

Nested grouping in MongoDB aggregation


Context:

I have a MongoDB full of Documents like this, which I want to dumb into one grouped json:

[
  {
    "_id": "615dc97907f597330c510279",
    "code": "SDFSDFSDF",
    "location": "ABC1",
    "week_number": 40,
    "year": 2021,
    "region": "NA"
  },

  ....

  {
    "_id": "615dc97907f597330c51027a",
    "code": "SDFSGSGR",
    "location": "ABC1",
    "week_number": 40,
    "year": 2021,
    "region": "EU"
  },

  ....

  {
    "_id": "615dc97607f597330c50ff50",
    "code": "GGSFHSFS",
    "location": "DEF2",
    "week_number": 40,
    "year": 2021,
    "region": "EU",
    "audit_result": {
      "issues_found": true,
      "comment": "comment."
    }
  }
]

I am trying to write an aggregation which should return and object like this:

{
  [
    "EU": {
      2021: {
        40: {
          "ABC1": {
            (All documents for location ABC1 and week 40, year 2021 and region EU)
          }
        },
        39: {
          ....
        }
      },
      2020: {
        ....
      }
    },
    "NA": {
      ....
    }
  ]
}

Problem:

I am not 100% sure how.

I started grouping them by region but I am not sure how to proceed after the first group. I tried grouping them by location first and group my way up to region but that also does not seem to work as I expected it.

The docs don't talk about a case like this and examples I find only group by one or two things, not four.

any insights highly appreciated :)


Solution

  • Using dynamic values as field name is generally considered as anti-pattern and you should avoid that. You are likely to introduce unnecessary difficulty to composing and maintaining your queries.

    Nevertheless, you can do the followings in an aggregation pipeline:

    1. $group at the finest level: region, year, week_number, location; $addToSet to group all the $ROOT document into an array named v
    2. $group at 1 coarser level: region, year, week_number; create k-v tuples that k is the location and v is the v from step 1. Use $addToSet to group the k-v tuples into an array named v
    3. use $arrayToObject to convert your k-v tuples into fields with dynamic values e.g.
    "ABC" : [
        {
            "_id": "615dc97907f597330c510279", 
            ...
        }, 
        ...
    ]
    
    1. Basically repeating step 2 & 3 at 1 coarser level: region, year; create k-v tuples that k is the location and v is the v from step 3. Use $addToSet to group the k-v tuples into an array named v
    2. Repeat step 4 at 1 coarser level: region
    3. $group unconditionally (i.e. $group by _id: null); repeating previous step to put the results into a single array named v; use $arrayToObject to convert it again
    4. $replaceRoot to obtain your expected result

    Here is one small note: when $arrayToObject for numeric k value like year and week_number, the k value needs to be converted into String beforehand. You can use $toString to achieve this.

    db.collection.aggregate([
      {
        "$group": {
          "_id": {
            region: "$region",
            year: "$year",
            week_number: "$week_number",
            location: "$location"
          },
          "v": {
            "$addToSet": "$$ROOT"
          }
        }
      },
      {
        $group: {
          _id: {
            region: "$_id.region",
            year: "$_id.year",
            week_number: "$_id.week_number"
          },
          v: {
            "$addToSet": {
              k: "$_id.location",
              v: "$v"
            }
          }
        }
      },
      {
        "$addFields": {
          "v": {
            "$arrayToObject": "$v"
          }
        }
      },
      {
        $group: {
          _id: {
            region: "$_id.region",
            year: "$_id.year"
          },
          v: {
            "$addToSet": {
              k: {
                "$toString": "$_id.week_number"
              },
              v: "$v"
            }
          }
        }
      },
      {
        "$addFields": {
          "v": {
            "$arrayToObject": "$v"
          }
        }
      },
      {
        $group: {
          _id: {
            region: "$_id.region"
          },
          v: {
            "$addToSet": {
              k: {
                "$toString": "$_id.year"
              },
              v: "$v"
            }
          }
        }
      },
      {
        "$addFields": {
          "v": {
            "$arrayToObject": "$v"
          }
        }
      },
      {
        $group: {
          _id: null,
          v: {
            "$addToSet": {
              k: "$_id.region",
              v: "$v"
            }
          }
        }
      },
      {
        "$addFields": {
          "v": {
            "$arrayToObject": "$v"
          }
        }
      },
      {
        "$replaceRoot": {
          "newRoot": "$v"
        }
      }
    ])
    

    Here is the Mongo playground for your reference.