Search code examples
mongodbmongoengineflask-mongoengine

MongoEngine - How to deference a List Field efficiently when converting to json


class Parent(document):
    name = StringField()
    children = ListField(ReferenceField('Child'))

class Child(document):
    name = StringField()
    parents = ListField(ReferenceField(Parent))

@app.route('/home/')
def home():
    parents = Parent.objects.all()
    return render_template('home.html', items=parents)

I have two collections similar to that above, which maintain a many to many relationship.

In the template with Angular, I'm setting a javascript variable to a list of Parents like so:

$scope.items = {{ parents|tojson }};

This results in an array of Parents who'se chilren are an array of Object Ids (references), as opposed to dereferenced child objects:

$scope.items = [{'$oid': '123', 'name': 'foo', 'children': [{'$oid': '456'}]}];

I want this angular object to contain all of the dereferenced children. Is there an efficient way to do this?

So far, this is the only approach that to works for me, at O(n^3). I've minimized the list comprehensions for clarity. The multiple obj['_id'] = {'$oid': str(obj['_id']} are necessary to convert the ObjectId to something that can be serialized to json.

@app.route('/home/')
def home():
    parents = Parent.objects.all()
    temps = []
    for parent in parents:
        p = parent.to_mongo()
        # At this point, the children of parent and p are references only
        p['_id'] = {'$oid': str(p['_id'])
        temp_children = []
        for child in parent.children:
            # Now the child is dereferenced
            c = child.to_mongo()
            c['_id'] = {$oid': str(c['_id'])}
            # Children have links back to Parent. Keep these as references.
            c['parents'] = [{'oid': str(parent_ref)} for parent_ref in c['parents']]
            temp_children.append(c)

        p['children'] = temp_children
        temps.append(parent.to_mongo())

    return render_template('home.html', items=temps)            

The following do not work but result in non-dereferenced children:

json.loads(json.dumps(accounts))

Solution

  • Because you are only storing the children as references then you are always going to have to go back to the server to dereference them when using the QuerySet.all method as you are above. The folks at mongodb know that this is a big performance problem when using drivers like pymongo so they have an aggregation framework to allow you to do the dereferencing on the server.

    The documentation for using this with mongoengine is pretty poor but a look at the unit tests in the mongoengine source help to fill in the blanks.

    With help from this answer and if you are using mongodb 3.2 or later then you can achieve what you are looking to do as follows:

    import mongoengine as db
    from bson.json_util import dumps
    
    class Parent(db.Document):
        name = db.StringField()
        children = db.ListField(db.ReferenceField('Child'))
    
    
    class Child(db.Document):
        name = db.StringField()
        parents = db.ListField(db.ReferenceField(Parent))
    
    
    pipeline = [{"$unwind": "$children"},
                {"$lookup":
                     {"from": "child",
                      "localField": "children",
                      "foreignField": "_id",
                      "as": "children"
                      }
                 },
                {"$group": {
                    "_id": "$_id",
                    "name": {"$first": "$name"},
                    "children": {"$push": "$children"}
                }
                }
                ]
    
    
    @app.route('/home/')
    def home():
        parents = []
        for p in Parent.objects.aggregate(*pipeline):
            parents.append(p)
        items= dumps(parents)
        return render_template('home.html', items=items)
    

    Then in your home.html you only need:

    $scope.items = {{ items }};
    

    The basic steps in the pipeline here are:

    1. Unwind the children: make a separate document for every child element in the children array
    2. Lookup the children: go to the child collection and lookup based on the _id and store the result in a children field on each document. Essentially replacing the ObjectID with the matching document.
    3. Group the results: by _id and include the name based on the first item in the grouping and push all the children fields into a field named children

    $lookup is only available from mongodb 3.2 and if you need to run an early version of mongodb then you'll have no choice but to make multiple queries. Also, $lookup won't work on sharded collections.