Search code examples
pythonmongodbpymongo

Load Mongo `Binary.createfrombase64` field in Python


I have a Mongo collection with this kind of documents:

  {
    _id: ObjectId('656f001650078dc1d50872ed'),
    created_at: ISODate('2023-12-05T10:48:54.641Z'),
    user_id: Binary.createFromBase64('MjkyMmZkYmUtOTM1Yi0xMWVlLTlkMjEtN2U2NjQwYmEyNGEw', 4),
  } 

Here is how I load the data in Python:

from pymongo import MongoClient

client = MongoClient(os.getenv("MONGO_URL"))

collection = client.get_database(os.getenv("MONGO_DB")).get_collection(os.getenv("MONGO_COLLECTION"))

select = {'created_at': 1}

result = collection.find_one({}, select)

if I replace

select = {'created_at': 1}

with

select = {'user_id': 1}

I get this error:

...

File ~/.pyenv/versions/3.11.5/lib/python3.11/site-packages/pymongo/message.py:1619, in _OpMsg.unpack_response(self, cursor_id, codec_options, user_fields, legacy_response)
   1617 # If _OpMsg is in-use, this cannot be a legacy response.
   1618 assert not legacy_response
-> 1619 return bson._decode_all_selective(self.payload_document, codec_options, user_fields)

File ~/.pyenv/versions/3.11.5/lib/python3.11/site-packages/bson/__init__.py:1259, in _decode_all_selective(data, codec_options, fields)
   1236 """Decode BSON data to a single document while using user-provided
   1237 custom decoding logic.
   1238 
   (...)
   1256 .. versionadded:: 3.8
   1257 """
   1258 if not codec_options.type_registry._decoder_map:
-> 1259     return decode_all(data, codec_options)
   1261 if not fields:
   1262     return decode_all(data, codec_options.with_options(type_registry=None))

File ~/.pyenv/versions/3.11.5/lib/python3.11/site-packages/bson/__init__.py:1167, in decode_all(data, codec_options)
   1164 if not isinstance(codec_options, CodecOptions):
   1165     raise _CODEC_OPTIONS_TYPE_ERROR
-> 1167 return _decode_all(data, codec_options)

InvalidBSON: invalid length or type code

Here are the version I use:

Python: 3.11.5
pymongo: 4.6.1
MongoDB: 5.0.23

It seems to come from the Binary.createFromBase64.

Does anybody have a clue ?


Solution

  • Binary subtype 4 is a UUID, see https://bsonspec.org/spec.html.

    The base64 string you show is 'MjkyMmZkYmUtOTM1Yi0xMWVlLTlkMjEtN2U2NjQwYmEyNGEw', which decodes to the hexadecimal

    32 39 32 32 66 64 62 65 2d 39 33 35 62 2d 31 31 65 65 2d 39 64 32 31 2d 37 65 36 36 34 30 62 61 32 34 61 30
    

    Coverting that to string gives "2922fdbe-935b-11ee-9d21-7e6640ba24a0"

    In other words, the UUID was not properly encoded before being stored in the database.

    The correct hexadecimal for that UUID would be

    29 22 fd be 93 5b 11 ee 9d 21 7e 66 40 ba 24 a0
    

    And the corresponding base64 should be

    Binary.createFromBase64("KSL9vpNbEe6dIX5mQLokoA==",4)
    

    Consider using pymongo's native UUID handling: https://pymongo.readthedocs.io/en/stable/examples/uuid.html