Search code examples
mongodbmongodb-queryaggregation-framework

Group embedded array of documents only


in MongoDB I want to group an array of documents that is nested in another document without it affecting the parent document.

Database:

db={
      "users": [
        {
          "firstName": "David",
          "lastName": "Mueller",
          "messages": [
            {
              "text": "hello",
              "type": "PERSONAL"
            },
            {
              "text": "test",
              "type": "DIRECT"
            }
          ]
        },
        {
          "firstName": "Mia",
          "lastName": "Davidson",
          "messages": [
            {
              "text": "hello world",
              "type": "DIRECT"
            },
            {
              "text": ":-)",
              "type": "PERSONAL"
            },
            {
              "text": "hi there",
              "type": "DIRECT"
            }
          ]
        }
      ]
    }

Desired result:

[
      {
        "firstName": "David",
        "lastName": "Mueller",
        "messages": [
          {
            "_id": "PERSONAL",
            "count": 1
          },
          {
            "_id": "DIRECT",
            "count": 1
          }
        ]
      },
      {
        "firstName": "Mia",
        "lastName": "Davidson",
        "messages": [
          {
            "_id": "PERSONAL",
            "count": 1
          },
          {
            "_id": "DIRECT",
            "count": 2
          }
        ]
      }
]

If I have an array of ids I already know how to do it using the internal pipeline of $lookup, but my question is how can I do that with an array of embedded documents.


This is an example of a working grouping on an array with ids using lookup. This is not the solution because the question is about an embedded document array and not an array of ids. This example is only provided to show that I can archive the desired result when ids instead of embedded documents are stored in an array.

Database for grouping with lookup:

db={
  "users": [
    {
      "firstName": "David",
      "lastName": "Mueller",
      "messages": [
        1,
        2
      ]
    },
    {
      "firstName": "Mia",
      "lastName": "Davidson",
      "messages": [
        3,
        4,
        5
      ]
    }
  ],
  "messages": [
    {
      "_id": 1,
      "text": "hello",
      "type": "PERSONAL"
    },
    {
      "_id": 2,
      "text": "test",
      "type": "DIRECT"
    },
    {
      "_id": 3,
      "text": "hello world",
      "type": "DIRECT"
    },
    {
      "_id": 4,
      "text": ":-)",
      "type": "PERSONAL"
    },
    {
      "_id": 5,
      "text": "hi there",
      "type": "DIRECT"
    }
  ]
}

Aggregation of grouping with lookup:

db.users.aggregate([
  {
    "$lookup": {
      "from": "messages",
      "localField": "messages",
      "foreignField": "_id",
      "as": "messages",
      "pipeline": [
        {
          "$group": {
            "_id": "$type",
            "count": {
              "$sum": 1
            }
          }
        }
      ]
    }
  }
])

Result of grouping with lookup (which is the desired result):

[
  {
    "_id": ObjectId("5a934e000102030405000005"),
    "firstName": "David",
    "lastName": "Mueller",
    "messages": [
      {
        "_id": "PERSONAL",
        "count": 1
      },
      {
        "_id": "DIRECT",
        "count": 1
      }
    ]
  },
  {
    "_id": ObjectId("5a934e000102030405000006"),
    "firstName": "Mia",
    "lastName": "Davidson",
    "messages": [
      {
        "_id": "PERSONAL",
        "count": 1
      },
      {
        "_id": "DIRECT",
        "count": 2
      }
    ]
  }
]

This example in the MongoDB playground


Now back to the issue: I want to archive the same result but with an embedded document array as provided at the top.

I cannot find out how to do this (I tried AI, lot's of google searches and other forums without success, you are my last resource before giving up), I know I can filter an embedded array using $addField and $fitler but not how I can group just the embedded array.

Please note that this is just a simple example my real data structure looks different and might also use other grouping functions like min, sum etc. but I just wanted to know a general way of archieving the same thing as when I use the lookup.

I appreciate any help with this and thank you 🙂


Solution

    1. Unwind messages
      • Use preserveNullAndEmptyArrays: true in case the field is missing or empty.
    2. Group by the _id (presumably userID) and message type; and set countType with $count.
    3. Then re-group by _id only and use the first doc (since it's the same for non-message fields)
      • Push each {type: ..., count: countType} into a messages array.
      • Note that the documents are back to being one per user/_id
    4. Set the doc.messages to the array messages which was pushed in the previous step.
      • Check for the first message not having a type - this occurs when the original document had missing or empty messages. If it is, then set it to the empty array, otherwise, use as-is.
      • Use any default you want, like [{ _id: null, count: 0 }]
      • (I've added an example doc in the playground which has one message but since it has a type it doesn't meet this criteria and is set correctly.)
    5. Replace the root with the new doc which has all the correct info
    6. (Optionally, sort on _id if you need it.)
    db.users.aggregate([
      {
        $unwind: {
          path: "$messages",
          preserveNullAndEmptyArrays: true
        }
      },
      {
        $group: {
          _id: {
            _id: "$_id",
            type: "$messages.type"
          },
          countType: { $count: {} },
          doc: { $first: "$$ROOT" }
        }
      },
      {
        $group: {
          _id: "$_id._id",
          doc: { $first: "$doc" },
          messages: {
            $push: {
              _id: "$_id.type",
              count: "$countType"
            }
          }
        }
      },
      {
        $set: {
          "doc.messages": {
            // set it to some default missing message type
            $cond: {
              // `messages._id` will be missing
              if: {
                $eq: [
                  { $type: { $first: "$messages._id" } },
                  "missing"
                ]
              },
              // put whichever "default" you want
              // like `[{ _id: null, count: 0 }]`
              then: [],
              else: "$messages"
            }
          }
        }
      },
      { $replaceWith: "$doc" }
    ])
    

    Mongo Playground

    Mongo Playground with minNumber

    Previous demo playground if messages is never missing and never empty.