I am new to MongoDB and Pymongo, but have learnt some lessons from their Mongo university. I have a nested document, and I would like to only extract specific values from it. I am currently trying to extract the numeric part of PriceEGC but I wasn't successful. My code to construct the projection and extract specific values is shown below:
import os
import math
import pymongo
from pprint import pprint
from datetime import datetime
from bson.json_util import dumps
from bson.decimal128 import Decimal128
# more code above not shown
for collection in all_collections[:1]:
first_seen_date = collection.name.split("_")[-1]
projection = {
(more projections)...,
"RegoExpiryDate": "$Vehicle.Registration.Expiry",
"VIN": "$_id",
"ComplianceDate": None,
"PriceEGC": "$Price.FinalDisplayPrice", # <- this here is the problem
"Price": None,
"Reserve": "$Search.IsReservedDate",
"StartingBid": None,
"Odometer": "$Vehicle.Odometer",
(more projections)...
}
batch_size = 1
num_batches = math.ceil(collection.count_documents({}) / batch_size)
for num in range(1): # range(num_batches):
pipeline = [
{"$match": {}},
{"$project": projection},
{"$skip": batch_size * num},
{"$limit": batch_size},
]
aggregation = list(collection.aggregate(pipeline))
yield aggregation
if __name__ == "__main__":
print(dumps(next(get_all_collections()), indent=2))
A typical document looks like this:
What I do with the aggregation is I print out this single document, to see what it looks like first, before loading the entire collection somewhere.
The output I don't want is this:
[{
(more key-value pairs)...,
"RegoExpiryDate": "2021-08-31T00:00:00.000Z",
"VIN": "JTMRBREVX0D087618",
"ComplianceDate": null,
"PriceEGC": {
"$numberDecimal": "36268.00" # <- Don't want this
},
"Price": null,
"Reserve": null,
"StartingBid": null,
"Odometer": 54567,
(more key-value pairs)...
}]
The output I want is this:
[{
(more key-value pairs)...,
"RegoExpiryDate": "2021-08-31T00:00:00.000Z",
"VIN": "JTMRBREVX0D087618",
"ComplianceDate": null,
"PriceEGC": 36268.00, (or) "PriceEGC": "36268.00", # <- Want this
"Price": null,
"Reserve": null,
"StartingBid": null,
"Odometer": 54567,
(more key-value pairs)...
}]
How would I write the projection or the pipeline, so that I get what I want, as shown above? I've tried:
projection = {...,
"PriceEGC": "$Price.FinalDisplayPrice.$numberDecimal",
...
}
and
projection = {...,
"PriceEGC": {"$toDecimal": "$Price.FinalDisplayPrice"}
...
}
and
projection = {...,
"PriceEGC": Decimal128.to_decimal("$Price.FinalDisplayPrice")
...
}
and altering the pipeline as well
pipeline = [
{"$match": {}},
{"$project": projection},
{"$toDecimal": "$Price.FinalDisplayPrice"},
{"$skip": batch_size * num},
{"$limit": batch_size},
]
What you are seeing is the bson.json_util()
representation of a MongoDB Decimal128 object. For more information on this data type, see https://www.mongodb.com/developer/quickstart/bson-data-types-decimal128/
The bson.json_util()
function provides the $numberDecimal
wrapper so that the data type is preserved should you wish later to load the data back in.
If you want a different behaviour then you might want to use the regular json.dumps()
and override the Decimal128 behaviour, e.g.
from pymongo import MongoClient
from bson.decimal128 import Decimal128
import json
import bson
class CustomJsonEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, bson.decimal128.Decimal128):
return float(obj.to_decimal())
db = MongoClient()['mydatabase']
collection = db['mycollection']
collection.insert_one({'Price': {'FinalDisplayPrice': Decimal128('36268.00')}})
print(json.dumps(list(collection.find({}, {'_id': 0})), indent=4, cls=CustomJsonEncoder))
prints:
[
{
"Price": {
"FinalDisplayPrice": 36268.0
}
}
]