Search code examples
pythonmongodbpymongomongoengine

Cannot deserialize properly a response using pymongo


I was using an API that were written in NodeJS, but for some reasons I had to re-write the code in python. The thing is that the database is in MongoDB, and that the response to all the queries (with large results) includes a de-serialized version with$id as, for example {$oid: "f54h5b4jrhnf}.

this object id representation with the nested $iod instead of just the plain string that Node use to return, is messing with the front end, and I haven't been able to find a way to get just the string, rather than this nested object (other than iterate in every single document and extract the id string) without also changing the way the front end treat the response

is there a solution to get a json response of the shape [{"_id":"63693f438cdbc3adb5286508", etc...} ?

I tried using pymongo and mongoengine, both seems unable to de-serialize in a simple way


Solution

  • You have several options (more than mentioned below).

    MongoDB

    In a MongoDB query, you could project/convert all ObjectIds to a string using "$toString".

    Python

    Iterate, like you mention in your question.

    --OR--

    You could also define/use a custom pymongo TypeRegistry class with a custom TypeDecoder and use it with a collection's CodecOptions so that every ObjectId read from a collection is automatically decoded as a string.

    Here's how I did it with a toy database/collection.

    from bson.objectid import ObjectId
    from bson.codec_options import TypeDecoder
    
    class myObjectIdDecoder(TypeDecoder):
        bson_type = ObjectId
        def transform_bson(self, value):
            return str(value)
    
    from bson.codec_options import TypeRegistry
    type_registry = TypeRegistry([myObjectIdDecoder()])
    
    from bson.codec_options import CodecOptions
    codec_options = CodecOptions(type_registry=type_registry)
    
    collection = db.get_collection('geojson', codec_options=codec_options)
    
    # the geojson collection _id field values have type ObjectId
    # but because of the custom CodecOptions/TypeRegistry/TypeDecoder
    # all ObjectId's are decoded as strings for python
    collection.find_one()["_id"]
    # returns '62ae621406926107b33b523c' I.e., just a string.