Search code examples
pythonmongodbpymongo

PyMongo/MongoDB - Grouping with many null values after unwinding - possible?


So I got my input data, which looks like this (example excerpt of one document, truncated - there are a lot more attributes):

{'_id': ObjectId('62bab00ea4b4ebd48908457d'),
 'id': 'xxx1234',
 'masterVariant':
  attributes': [
   {'name': 'propertyA', 'value': 'CJS'},
   {'name': 'propertyB', 'value': 300},
   {'name': 'propertyC', 'value': 221},
   {'name': 'propertyE', 'value': 'dasdags'}
  ]
 }

What I'm trying to do is to create a pandas DataFrame out of this data with following operation (using aggregate_pandas_all(), ignore that I'm only using aggregate() here).

list(coll.aggregate([
    {'$unwind': '$masterVariant.attributes'},
    
    {'$project': {
        'id': '$id',
        'propertyA': {
            '$cond': [{'$eq': ['$masterVariant.attributes.name', 'propertyA']}, '$masterVariant.attributes.value', 'None']
        },
        'propertyE': {
            '$cond': [{'$eq': ['$masterVariant.attributes.name', 'propertyE']}, '$masterVariant.attributes.value', 'None']
        },  
    }},
    { '$group': {
        '_id': '$_id',
        'id': {'$first': '$id'},
        'propertyA': {'$first': '$propertyA'},
        'propertyE': {'$first': '$propertyE'},
    }
    }
]

))

I'm trying to extract several of the attributes and create a tuple/row out of it, basically in this case the result should look like this:

id propertyA propertyE
xxx1234 'CJS' 'dasdags'

Issue I'm getting is rows full of 'None' entries. This is due to the fact that unwinding creates a lot of documents for every property/attribute there is and I can't properly use the $first (or $last) aggregation operator. As far as I understand the $unwind operation it would yield something like this (without $group) - correct me if I'm wrong:

| id | propertyA | propertyE |
| -- | --------- | ----------|
| xxx1234| 'CJS'     | 'None'    |
| xxx1234| 'None'    | 'dasdags' |

Anyone know if what I want to achieve is actually possible (easily?)? Hope I somehow formulated my question clearly.


Solution

  • The issue is you are getting null because of $first. It will only take the first element of the array. To resolve this in the $group stage you can use $push with $cond this will ignore null values.

    Then add one more $project stage to get the final output

    Complete Aggregation query

    db.collection.aggregate([
      {
        "$unwind": "$masterVariant.attributes"
      },
      {
        "$project": {
          "id": "$id",
          "propertyA": {
            "$cond": [
              {
                "$eq": [
                  "$masterVariant.attributes.name",
                  "propertyA"
                ]
              },
              "$masterVariant.attributes.value",
              null
            ]
          },
          "propertyE": {
            "$cond": [
              {
                "$eq": [
                  "$masterVariant.attributes.name",
                  "propertyE"
                ]
              },
              "$masterVariant.attributes.value",
              null
            ]
          },
          
        }
      },
      {
        "$group": {
          "_id": "$_id",
          "id": {
            "$first": "$id"
          },
          "propertyA": {
            "$push": {
              "$cond": [
                {
                  "$ne": [
                    "$propertyA",
                    null
                  ]
                },
                "$propertyA",
                "$$REMOVE"
              ]
            }
          },
          "propertyE": {
            "$push": {
              "$cond": [
                {
                  "$ne": [
                    "$propertyE",
                    null
                  ]
                },
                "$propertyE",
                "$$REMOVE"
              ]
            }
          },
          
        }
      },
      {
        "$project": {
          "id": "$id",
          "propertyA": {
            "$first": "$propertyA"
          },
          "propertyE": {
            "$first": "$propertyE"
          },
          
        }
      }
    ])
    

    Try in Mongodb Playground