I am using PyMongo to simply iterate over a Mongo collection, but I'm struggling with handling large Mongodb date objects.
For example, if I have some data in a collection that looks like this:
"bad_data" : [
{
"id" : "id01",
"label" : "bad_data",
"value" : "exist",
"type" : "String",
"lastModified" : ISODate("2018-06-01T10:04:35.000Z"),
"expires" : Date(9223372036854775000)
}
]
I will do something like:
from pymongo import MongoClient, database, cursor, collection
client = MongoClient('localhost')
db = client['db1']
db.authenticate('user', 'pass', source='admin')
collection = db['collection']
for i in collection:
# do something with i
and get the error InvalidBSON: year 292278994 is out of range
Is there any way I can handle dealing with this rediculous Date()
object without bson falling over? I realise that having such a date in Mongodb is crazy but there is nothing I can do about this as it's not my data.
There actually is a section in the PyMongo FAQ about this very topic:
Why do I get OverflowError decoding dates stored by another language’s driver?
PyMongo decodes BSON datetime values to instances of Python’s
datetime.datetime
. Instances ofdatetime.datetime
are limited to years betweendatetime.MINYEAR
(usually 1) anddatetime.MAXYEAR
(usually 9999). Some MongoDB drivers (e.g. the PHP driver) can store BSON datetimes with year values far outside those supported bydatetime.datetime
.
So the basic constraint here is on the datetime.datetime
type as implemented for the mapping from BSON by the driver, and though it might be "ridiculous" it's valid for other languages to create such a date value.
As pointed to in the FAQ your general workarounds are:
Deal with the offending BSON Date. Whilst valid to store, it possibly was not the "true" intention of whomever/whatever stored it in the first place.
Add a "date range" condition to your code to filter "out of range" dates:
result = db['collection'].find({
'expires': { '$gte': datetime.min, '$lte': datetime.max }
})
for i in result:
# do something with i
Omit the offending date field in projection if you don't need the data in further processing:
result = db['collection'].find({ }, projection={ 'expires': False })
for i in result:
# do something with i
Certainly 'expires'
as a name suggests the original intent of the value was a date so far into the future that it was never going to come about, with the original author of that data ( and very possibly current code still writing it ) not being aware of the "Python" date constraint. So it's probably quite safe to "lower" that number in all documents and where any code is still writing it.