I have a Mongo collection with this kind of documents:
{
_id: ObjectId('656f001650078dc1d50872ed'),
created_at: ISODate('2023-12-05T10:48:54.641Z'),
user_id: Binary.createFromBase64('MjkyMmZkYmUtOTM1Yi0xMWVlLTlkMjEtN2U2NjQwYmEyNGEw', 4),
}
Here is how I load the data in Python:
from pymongo import MongoClient
client = MongoClient(os.getenv("MONGO_URL"))
collection = client.get_database(os.getenv("MONGO_DB")).get_collection(os.getenv("MONGO_COLLECTION"))
select = {'created_at': 1}
result = collection.find_one({}, select)
if I replace
select = {'created_at': 1}
with
select = {'user_id': 1}
I get this error:
...
File ~/.pyenv/versions/3.11.5/lib/python3.11/site-packages/pymongo/message.py:1619, in _OpMsg.unpack_response(self, cursor_id, codec_options, user_fields, legacy_response)
1617 # If _OpMsg is in-use, this cannot be a legacy response.
1618 assert not legacy_response
-> 1619 return bson._decode_all_selective(self.payload_document, codec_options, user_fields)
File ~/.pyenv/versions/3.11.5/lib/python3.11/site-packages/bson/__init__.py:1259, in _decode_all_selective(data, codec_options, fields)
1236 """Decode BSON data to a single document while using user-provided
1237 custom decoding logic.
1238
(...)
1256 .. versionadded:: 3.8
1257 """
1258 if not codec_options.type_registry._decoder_map:
-> 1259 return decode_all(data, codec_options)
1261 if not fields:
1262 return decode_all(data, codec_options.with_options(type_registry=None))
File ~/.pyenv/versions/3.11.5/lib/python3.11/site-packages/bson/__init__.py:1167, in decode_all(data, codec_options)
1164 if not isinstance(codec_options, CodecOptions):
1165 raise _CODEC_OPTIONS_TYPE_ERROR
-> 1167 return _decode_all(data, codec_options)
InvalidBSON: invalid length or type code
Here are the version I use:
Python: 3.11.5
pymongo: 4.6.1
MongoDB: 5.0.23
It seems to come from the Binary.createFromBase64
.
Does anybody have a clue ?
Binary subtype 4 is a UUID, see https://bsonspec.org/spec.html.
The base64 string you show is 'MjkyMmZkYmUtOTM1Yi0xMWVlLTlkMjEtN2U2NjQwYmEyNGEw', which decodes to the hexadecimal
32 39 32 32 66 64 62 65 2d 39 33 35 62 2d 31 31 65 65 2d 39 64 32 31 2d 37 65 36 36 34 30 62 61 32 34 61 30
Coverting that to string gives "2922fdbe-935b-11ee-9d21-7e6640ba24a0"
In other words, the UUID was not properly encoded before being stored in the database.
The correct hexadecimal for that UUID would be
29 22 fd be 93 5b 11 ee 9d 21 7e 66 40 ba 24 a0
And the corresponding base64 should be
Binary.createFromBase64("KSL9vpNbEe6dIX5mQLokoA==",4)
Consider using pymongo's native UUID handling: https://pymongo.readthedocs.io/en/stable/examples/uuid.html