Search code examples
pythondatabasedjangoperformancedjango-mptt

How optimize adding new nodes in `django-mptt`?


I am creating a script which will synchronize two databases. There is a data in the database which should be stored as a tree so I use django-mptt for the new DB. When I syncing DB's I select new data from the old DB and should save it in the new one.

I want to know if there is a better way to add new nodes into a tree? Now it looks next way:

...
# Add new data to DB
for new_record in new_records:
    # Find appropriate parent using data in 'new_record'
    parent = get_parent(new_record)

    # Create object which should be added using data in 'new_record'
    new_node = MyMPTTModel(...)
    new_node.insert_at(parent, save = True)
    # Similar to:
    # new_node.insert_at(parent, save = False)
    # new_node.save()

But it works very slow. I think it works in a such way because after each call of the insert_at(..., save = True) method django-mptt should write new node to the DB and modify left and right keys for records which are already in the DB.

Is there any way to make django-mptt modify a query each time I call insert_at and then apply all changes together when I call save? Or do you know any other ways how to reduce execution time?

Thanks in advance.


Solution

  • Firstly, don't use insert_at. It's not the reason for slow performance, but it's unnecessary and looks ugly. Just set node.parent:

    for new_record in new_records:
        new_node = MyMPTTModel(..., parent=get_parent(new_record))
        new_node.save()
    

    Now for the performance question. If you're using the latest mptt (git master, not 0.5.4), there's a context manager called delay_mptt_updates to prevent mptt from doing a lot of these updates until you've added all the nodes:

    with transaction.atomic():
        with MyMPTTModel.objects.delay_mptt_updates():
            for new_record in new_records:
                new_node = MyMPTTModel(..., parent=get_parent(new_record))
                new_node.save()
    

    Alternatively if you're touching almost the entire tree, you can speed things up even more by using disable_mptt_updates and rebuild the whole tree at the end:

    with transaction.atomic():
        with MyMPTTModel.objects.disable_mptt_updates():
            for new_record in new_records:
                new_node = MyMPTTModel(..., parent=get_parent(new_record))
                new_node.save()
        MyMPTTModel.objects.rebuild()