Search code examples
pythonmongodbprotocol-bufferspymongo

How to configure pymongo to automatically marshal and unmarshal protobufs


I'd like to have protobuf support in pymongo. Specifically, I want to be able to pass protobufs anywhere that I could pass a dict (such as collection.insert_one()) and I want to get the appropriate protobuf instance anywhere I'd otherwise get a dict (such as collection.find_one()).

So far, for my proof of concept, I'm able to do this by converting the protobuf to a Python dict (using json_format.MessageToDict()) and then passing the dict to pymongo, and vice versa. This double-conversion is hacky, inefficient, and I haven't been able to get it to handle all the datatypes I want (such as the binary formats). By "double conversion" I mean that I'm converting first to dict then to the final structure (either protobuf or BSONDocument). The major downside here is that despite calling the function MessageToDict(), the protobuf json_format library converts to a json-safe dict -- bytes are converted to base64, int64s are converted to strings, etc.

I found somebody who seems to have done this in golang. (Note that my primary goal is to marshal and unmarshal, not to do the protobuf ID reflection as in his example.) I found the pymongo docs on custom types but the way I understand the docs is that the TypeCodec class is only for individual types (such as Decimal) and not for the entire message / document. I also found document_class but that appears to be only for decoding and requires a subclass of MutableMapping, which generated protobufs are not.

I found bson.encode() and bson.decode() but, short of monkeypatching, it's not clear how I would override or configure these.


Solution

  • I was unable to find anything that converted directly between bson<>protobuf, but I did find this old package to convert between bson<>dict. Unlike Google's json_format module, protobuf-to-dict does not helpfully convert datatypes which aren't in the JSON spec (e.g., int64s). (And it allows you to override the type converters if desired.)

    While I'd still prefer not to have to convert via a Python dict, the use of this package solves the major problems around undesired type conversion, so I cleaned it up a bit, updated it to py3 (which was trivial), and took out the base64 binary conversions.