Search code examples
mongodbgroupingpymongo

MongoDB Grouping and counting by composite field


Here's what my records look like:

I have obtained this by using collection.find().limit(1)

[
  {
    "_id": {"$oid": "..."},
    "husband.firstName": "John",
    "husband.secondName": "Smith",
    "wife.firstName": "Alice",
    "wife.secondName": "Watson",
    "...": "...",
  }
]

The husband and wife fields only contains firstName and secondName

I want to count how common each husband and wife name combos are.

I imagine the results in the form of something like:

[
  {
    "husband.firstName": "John",
    "husband.secondName": "Smith",
    "wife.firstName": "Alice",
    "wife.secondName": "Watson",
    "count": "456",
  },
  {
    "husband.firstName": "Jack",
    "husband.secondName": "Smith",
    "wife.firstName": "Alice",
    "wife.secondName": "Watson",
    "count": "123",
  }
]

I'm using Python and pymongo so I have tried the following:

pipeline = [
    {
        "$group": {
            "_id": {
                "husband": "$husband",
                "wife": "$wife"
            },
            "count": {"$sum": 1}
        }
    },
    {
        "$sort": {"count": -1}
    },
]

However this returns an empty result of:

[
  {
    "_id": "{}",
    "count": 47553
  }
]

I tried also grouping by the fields separately but the result was the same.


Solution

  • As pointed out in the comments by @cmgchess, the main difficulty for your case is the dot in your field name. You may want to refactor your schema to something like below:

    {
        "husband": {
          "firstName": "John",
          "secondName": "Smith"
        },
        "wife": {
          "firstName": "Alice",
          "secondName": "Watson"
        }
      }
    

    Nevertheless, for your current schema, you may workaround it through usage of $getField.

    db.collection.aggregate([
      {
        "$group": {
          "_id": {
            "husband": {
              "firstName": {
                "$getField": {
                  "field": "husband.firstName",
                  "input": "$$ROOT"
                }
              },
              "secondName": {
                "$getField": {
                  "field": "husband.secondName",
                  "input": "$$ROOT"
                }
              }
            },
            "wife": {
              "firstName": {
                "$getField": {
                  "field": "wife.firstName",
                  "input": "$$ROOT"
                }
              },
              "secondName": {
                "$getField": {
                  "field": "wife.secondName",
                  "input": "$$ROOT"
                }
              }
            }
          },
          "count": {
            "$sum": 1
          }
        }
      },
      {
        "$sort": {
          "count": -1
        }
      }
    ])
    

    Mongo Playground