I'm trying to store object in MongoDB. This objects comes from third-party system, and has very specific format, i.e. all object properties are stored in dictionary. Values in this dictionary could be of different types and in no particular order.
I believe to effectively search on these field I need to turn them into BSON properties. And it is doable with custom serializer / deserializer, until it comes to deserialization itself. If property is a complex object which is represented as an BSON document, custom deseriazer doesn't know to which type this document should be transformed.
How issues like that solved in a proper way using MongoDB BSON?
I would add new property $type
to complex document, and store there destination type during serialization, but it is interfering with build in MongoDB $type
Is it possible to use standard and custom $type
attributes side by side? What's the best practice approach for implementing custom deserializer in this case?
not without extending the spec itself or including some reference to how it should be (de)serialized in the document itself.
PHP driver has an ODM framework that does exactly what you're proposing. I suggest you look at http://php.net/manual/en/class.mongodb-bson-persistable.php
During serialization, the driver will inject a __pclass property containing the PHP class name into the data
So, it adds a specifc key "__pclass" to the document to be stored. During deserialization, the driver reads from the key to decide what specific deserialization steps to take and strips the __pclass key/value before it returns the document (now deserialized into whatever PHP class is specified by the __pclass key) to the user.
This is incredibly dangerous if you have any reason to not trust the data held in mongodb. It's basically allowing data to dictate a call to executable PHP code.
About the spec itself. http://bsonspec.org/spec.html
The types and their associated type index is hard coded into the spec.
you could create your own user generated binary subtype if you stored the blob in a binary block, using the user-defined subtype range.
The down side there is that the object would be stored in the database as a binary blob, making it very difficult to query beyond subtype checking.
Anything beyond that would involve extending the specification itself