Search code examples
mongodbpython-2.7mongoengine

Mongoengine: how to recursively dereference a document json-serialise them?


Having a mongoengine document that has a few ReferenceFields, how can be these recursively converted to JSON?

So far, I've tried:

  1. to_mongo() but ReferenceFields were ignored.

  2. DeReference()(video._data), I get only the repr() of each referenced model:

{
   'audience': 0,
   'channels': [],
   'created_date': datetime.datetime(2006, 1, 2, 12, 11, 3),
   'id': ObjectId('51af7076a2aa1c2179035e8e'),
   'images': {},
   'kind': u'VOD',
   'modified_date': datetime.datetime(2013, 6, 5, 19, 39, 4, 327000),
   'parts': [
      [<Media: VOD 0x0>,
       <Media: VOD 0x0>,
       <Media: VOD 0x0>,
       <Media: VOD 0x0>]
    ],
   'provider': <Provider: TEST>,
   'published_date': datetime.datetime(2006, 1, 2, 12, 11, 3),
   'sources': [<Source: TEST>],
   'titles': {u'en-US': u'TEST', u'es-ES': u'PRUEBA', u'pt-BR': u'TESTE'}
}

Are Media, Provider and Source objects not being serialized as they are supposed to be or am I missing something?


Solution

  • To get the ReferenceField value you need to call to_mongo() on the reference field - not the parent document as that will just return the DBRef or ObjectId for the document.

    So in your example - you have 3 reference fields: provider, parts and sources given a document instance called my_doc to get the dictionary values of the referenced documents do:

    provider = my_doc.provider.to_mongo()
    parts = [part.to_mongo() for part in my_doc.parts]
    sources = [source.to_mongo() for source in my_doc.sources]
    

    ** side note

    The schema shown seems highly relational and ReferenceFields cause in app joins (MongoEngine has to query internally and dereference for you). This won't be performant at high levels of scale as to get all the parts of your document - you need to query once for the document itself, once for the provider, once for the parts and once for the sources - 4 queries to retrieve one document. For larger querysets without doing queryset.select_related(2) this means for 4x queryset count. Which may not be desired.