Search code examples
pythonmongodbpymongobson

PyMongo json_util.dumps overwrite ObjectId representation


bson.json_util provides functions to convert to either canonical or relaxed JSON format. However, both of them stick to the same representation of the ObjectId:

from PyMongo import MongoClient
from bson.objectid import ObjectId
from bson import json_util
from bson.json_util import RELAXED_JSON_OPTIONS
from bson.json_util import CANONICAL_JSON_OPTIONS, DEFAULT_JSON_OPTIONS


db = MongoClient(URL)['DB_NAME']
mongo_query_result = db.collection.find_one({'_id': ObjectId('ID')}, 
                                                   {'_id': 1})

# returns {'_id': ObjectId('ID')}

print(json_util.dumps(mongo_query_result, json_options=RELAXED_JSON_OPTIONS))
print(json_util.dumps(mongo_query_result, json_options=CANONICAL_JSON_OPTIONS))
print(json_util.dumps(mongo_query_result, json_options=DEFAULT_JSON_OPTIONS))

# Results
{"_id": {"$oid": "ID"}}
{"_id": {"$oid": "ID"}}
{"_id": {"$oid": "ID"}}

# Desired Output
{"_id": "ID"}

The problem with that is it doesn't match the results I get in prod env. I am using PyMongo just to build test cases, the actual prod format is

{'_id': "ID", ..etc}

I looked a bit in the documentation over here, and here are the findings:

  1. Both CANONICAL_JSON_OPTIONS and RELAXED_JSON_OPTIONS stick to the problematic representation, specs over here.
  2. I can't seem to overwrite it because uuid_representation=PYTHON_LEGACY and I cannot seem to find a way around it.

Is there something I am missing to convert PyMongo query result to:

{'_id' : 'ID', ..}
# not
{'_id' : {'$oid' : 'ID'}, ..}

I would hate to extend my code just to handle the different format of the test cases.


Solution

  • As a work around, I was able to accomplish the same result with re regular expressions:

    import re
    
    def remove_oid(string):
        while True:
            pattern = re.compile('{\s*"\$oid":\s*(\"[a-z0-9]{1,}\")\s*}')
            match = re.search(pattern, string)
            if match:
                string = string.replace(match.group(0), match.group(1))
            else:
                return string
    
    string = json_dumps(mongo_query_result)
    string = remove_oid(string)
    

    This essentially replaces the CANONICAL_JSON to normalized one and remove the key-value to just a value.

    Although this gets the job done, it is not ideal, since I am manipulating JSON as a string, and very much error prone, plus doesn't work in Date or other format.