Search code examples
pythonmongodbpymongobsondollar-sign

mongodb find returns json object with keys that start with unwanted dollar sign ($date, $binary..)


I am using python 3.9.12 to query mongodb, I then read the values into variables and continue with my logic. Problem is, some of my values have keys that start with dollar sign. Here is an example of a json I get:

[
    {
        "_id": {
            "$oid": "234876234875236752309823"
        },
        "createdAt": {
            "$date": "2022-11-13T20:50:18.184Z"
        },
        "moreFields": {
            "key1": "blabla1",
            "key2": "blabla2",
            "key3": "blabla3"
        },
        "entityId": {
            "$binary": {
                "base64": "z0kWDTHiSlawpI2wHjyrWA==",
                "subType": "04"
            }
        }
    }
]

I understand that those mongodb field types (bson, datetime...). But this makes my life hard in trying to access those values using python.

I was reading and looking but I couldn't find a method to convert them to "normal" keys. Ideally I would want to correct my mongodb query (get datetime as strings and $binary as UUID strings). I have found a stupid workaround in python but unfortunately it is very stupid and I want to correct my ways.

Any ideas? Thanks :)

I would really be happy if the result of my mongodb query would change to:

[
    {
        "_id": "234876234875236752309823",
        "createdAt": "2022-11-13T20:50:18.184Z",
        "moreFields": {
            "key1": "blabla1",
            "key2": "blabla2",
            "key3": "blabla3"
        },
        "entityId": "e87b22b2-ea15-4176-9100-c65f79f0e5b2"
    }
]

Solution

  • If your data is in a string format (say, from a file), use loads from the bson.json_util module. https://pymongo.readthedocs.io/en/stable/api/bson/json_util.html

    For the second part, that is just formatting; but beware, this just creates another string output. Chances are the data you are interested in is actually in the record object.

    The following snippet converts the input string, loads it into MongoDB, and then formats it back to a string using a custom encoder:

    import datetime
    import json
    import bson
    from bson import json_util
    from pymongo import MongoClient
    
    db = MongoClient()['mydatabase']
    
    records = '''[
        {
            "_id": {
                "$oid": "234876234875236752309823"
            },
            "createdAt": {
                "$date": "2022-11-13T20:50:18.184Z"
            },
            "moreFields": {
                "key1": "blabla1",
                "key2": "blabla2",
                "key3": "blabla3"
            },
            "entityId": {
                "$binary": {
                    "base64": "z0kWDTHiSlawpI2wHjyrWA==",
                    "subType": "04"
                }
            }
        }
    ]'''
    
    
    db.mycollection.insert_many(json_util.loads(records))
    
    
    class MyJsonEncoder(json.JSONEncoder):
        def default(self, obj):
            if isinstance(obj, datetime.datetime):
                return obj.isoformat()                # Format dates as ISO strings
            elif isinstance(obj, bson.Binary) and obj.subtype == bson.binary.UUID_SUBTYPE:
                return obj.as_uuid()                  # Format binary data as UUIDs
            elif hasattr(obj, '__str__'):
                return str(obj)                       # This will handle ObjectIds
    
            return super(MyJsonEncoder, self).default(obj)
    
    
    record = db.mycollection.find_one()
    print(json.dumps(record, cls=MyJsonEncoder, indent=4))
    

    prints:

    {
        "_id": "234876234875236752309823",
        "createdAt": "2022-11-13T20:50:18.184000",
        "moreFields": {
            "key1": "blabla1",
            "key2": "blabla2",
            "key3": "blabla3"
        },
        "entityId": "cf49160d-31e2-4a56-b0a4-8db01e3cab58"
    }