Search code examples
pythondjangopostgresqltriggersmodels

Django "emulate" database trigger behavior on bulk insert/update/delete


It's a self expaining question but here we go. I'm creating a business app in Django, and i didn't wanted to "spread" all the logic across app AND database, but in the other hand, i didn't wanted to let the Database handle this task (its possible through the use of Triggers).

So I wanted to "reproduce" the behavior of the Databse Triggers, but inside the Model Class in Django (um currently using Django 1.4).

After some research, I figured out that with single objects, I could override the "save" and "delete" methods of "models.Model" class, inserting the "before" and "after" hooks so they could be executed before and after the parent's save/delete. Like This:

     class MyModel(models.Model):

         def __before(self):
             pass

         def __after(self):
            pass

         @commit_on_success #the decorator is only to ensure that everything occurs inside the same transaction
         def save(self, *args, *kwargs):
             self.__before()
             super(MyModel,self).save(args, kwargs)
             self.__after()

The BIG problem is with bulk operations. Django doesn't triggers the save/delete of the models when running the "update()"/"delete()" from it's QuerySet. Insted, it uses the QuerySet's own method. And to get a little bit worst, it doesn't trigger any signal either.

Edit: Just to be a little more specific: the model loading inside the view is dynamic, so it's impossible to define a "model specific" way. In this case, I should create an Abstract Class and handle it there.

My last attempt was to create a custom Manager, and in this custom manager, override the update method, looping over the models inside the queryset, and trigering the "save()" of each model (take in consideration the implementation above, or the "signals" system). It works, but results in a database "overload" (imagine a 10k rows queryset being updated).


Solution

  • With a few caveats, you can override the queryset's update method to fire the signals, while still using an SQL UPDATE statement:

    from django.db.models.signals import pre_save, post_save
    
    def CustomQuerySet(QuerySet):
        @commit_on_success
        def update(self, **kwargs):
            for instance in self:
                pre_save.send(sender=instance.__class__, instance=instance, raw=False, 
                              using=self.db, update_fields=kwargs.keys())
            # use self instead of self.all() if you want to reload all data 
            # from the db for the post_save signal
            result = super(CustomQuerySet, self.all()).update(**kwargs)
            for instance in self:
                post_save.send(sender=instance.__class__, instance=instance, created=False,
                               raw=False, using=self.db, update_fields=kwargs.keys())
            return result
    
        update.alters_data = True
    

    I clone the current queryset (using self.all()), because the update method will clear the cache of the queryset object.

    There are a few issues that may or may not break your code. First of all it will introduce a race condition. You do something in the pre_save signal's receivers based on data that may no longer be accurate when you update the database.

    There may also be some serious performance issues with large querysets. Unlike the update method, all models will have to be loaded into memory, and then the signals still need to be executed. Especially if the signals themselves have to interact with the database, performance can be unacceptably slow. And unlike the regular pre_save signal, changing the model instance will not automatically cause the database to be updated, as the model instance is not used to save the new data.

    There are probably some more issues that will cause a problem in a few edge cases.

    Anyway, if you can handle these issues without having some serious problems, I think this is the best way to do this. It produces as little overhead as possible while still loading the models into memory, which is pretty much required to correctly execute the various signals.