Search code examples
pythonmongodbpymongoaggregationprojection

Project a numberDecimal value into a float in Python


I am new to MongoDB and Pymongo, but have learnt some lessons from their Mongo university. I have a nested document, and I would like to only extract specific values from it. I am currently trying to extract the numeric part of PriceEGC but I wasn't successful. My code to construct the projection and extract specific values is shown below:

import os
import math
import pymongo
from pprint import pprint
from datetime import datetime
from bson.json_util import dumps
from bson.decimal128 import Decimal128

# more code above not shown
for collection in all_collections[:1]:
    first_seen_date = collection.name.split("_")[-1]
    projection = {
        (more projections)...,
        "RegoExpiryDate": "$Vehicle.Registration.Expiry",
        "VIN": "$_id",
        "ComplianceDate": None,
        "PriceEGC": "$Price.FinalDisplayPrice",  # <- this here is the problem
        "Price": None,
        "Reserve": "$Search.IsReservedDate",
        "StartingBid": None,
        "Odometer": "$Vehicle.Odometer",
        (more projections)...
    }

    batch_size = 1
    num_batches = math.ceil(collection.count_documents({}) / batch_size)

    for num in range(1):  # range(num_batches):
        pipeline = [
            {"$match": {}},
            {"$project": projection},
            {"$skip": batch_size * num},
            {"$limit": batch_size},
        ]
        aggregation = list(collection.aggregate(pipeline))
        yield aggregation

if __name__ == "__main__":
    print(dumps(next(get_all_collections()), indent=2))

A typical document looks like this: typical document

What I do with the aggregation is I print out this single document, to see what it looks like first, before loading the entire collection somewhere.

The output I don't want is this:

[{
(more key-value pairs)...,
"RegoExpiryDate": "2021-08-31T00:00:00.000Z",
"VIN": "JTMRBREVX0D087618",
"ComplianceDate": null,
"PriceEGC": {
  "$numberDecimal": "36268.00"  # <- Don't want this
},
"Price": null,
"Reserve": null,
"StartingBid": null,
"Odometer": 54567,
(more key-value pairs)...
}]

The output I want is this:

[{
(more key-value pairs)...,
"RegoExpiryDate": "2021-08-31T00:00:00.000Z",
"VIN": "JTMRBREVX0D087618",
"ComplianceDate": null,
"PriceEGC": 36268.00, (or) "PriceEGC": "36268.00",  # <- Want this
"Price": null,
"Reserve": null,
"StartingBid": null,
"Odometer": 54567,
(more key-value pairs)...
}]

How would I write the projection or the pipeline, so that I get what I want, as shown above? I've tried:

projection = {...,
"PriceEGC": "$Price.FinalDisplayPrice.$numberDecimal",
...
}

and

projection = {...,
"PriceEGC": {"$toDecimal": "$Price.FinalDisplayPrice"}
...
}

and

projection = {...,
"PriceEGC": Decimal128.to_decimal("$Price.FinalDisplayPrice")
...
}

and altering the pipeline as well

pipeline = [
    {"$match": {}},
    {"$project": projection},
    {"$toDecimal": "$Price.FinalDisplayPrice"},
    {"$skip": batch_size * num},
    {"$limit": batch_size},
]

Solution

  • What you are seeing is the bson.json_util() representation of a MongoDB Decimal128 object. For more information on this data type, see https://www.mongodb.com/developer/quickstart/bson-data-types-decimal128/

    The bson.json_util() function provides the $numberDecimal wrapper so that the data type is preserved should you wish later to load the data back in.

    If you want a different behaviour then you might want to use the regular json.dumps() and override the Decimal128 behaviour, e.g.

    from pymongo import MongoClient
    from bson.decimal128 import Decimal128
    import json
    import bson
    
    
    class CustomJsonEncoder(json.JSONEncoder):
        def default(self, obj):
            if isinstance(obj, bson.decimal128.Decimal128):
                return float(obj.to_decimal())
    
    
    db = MongoClient()['mydatabase']
    collection = db['mycollection']
    
    collection.insert_one({'Price': {'FinalDisplayPrice': Decimal128('36268.00')}})
    print(json.dumps(list(collection.find({}, {'_id': 0})), indent=4, cls=CustomJsonEncoder))
    

    prints:

    [
        {
            "Price": {
                "FinalDisplayPrice": 36268.0
            }
        }
    ]