Search code examples
pythondjangodjango-rest-frameworkdjango-orm

How can I optimize @property method by prefetching or caching an "unrelated" table


I have three model objects, two of which directly relate to each other.

class Child(models.Model):
    date_of_birth = models.DateField()
    name = models.CharField(max_length=255)
    gender = models.CharField(max_length=10, choices=Gender.choices)

    @property
    def age(self):
        today = date.today()
        return age_on_date(date_calculating_on=today, date_of_birth=self.date_of_birth)
class ChildBmiCheck(models.Model):

    child = models.ForeignKey(Child, models.PROTECT, related_name="bmi_checks")
    height_in_meters = models.DecimalField(max_digits=40, decimal_places=10, null=True)
    weight_in_kgs = models.DecimalField(max_digits=40, decimal_places=10, null=True)
    child_was_present = models.BooleanField(default=True)

    @property
    def zscore(self):
        if self.child_was_present is True:
            age = self.child.age * 12  # in months
            gender = self.child.gender

            # TODO this is slow because for every child bmi check we hit the DB
            #  |
            #  V
            matching_data = ReferenceZscoreData.objects.filter(sex=gender).get(age=age)

            return self.calculate_z_score(matching_data)
        else:
            return None

And one that is used under the comment in the @property method above, which is not directly related through a foreign key relationship.

class ReferenceZscoreData(models.Model):
    sex = models.CharField(max_length=1, choices=Sex.choices, db_index=True)
    age = models.IntegerField(db_index=True)


    property1ForZscore = models.FloatField()
    property2ForZscore = models.FloatField()
    property3ForZscore = models.FloatField()

When I fetch Child BMI checks, I see that the DB is queried for the relavent ReferenceZscoreData every time this method is called. This makes the HTTP fetch in Django Rest Framework really slow. However, ReferenceZscoreData only changes once a year, so there is no reason to go to the database so often.

I tried adding this table to prefetch_related, but it isn't related through any ForeignKey relationship.

class ChildBmiCheckViewSet(viewsets.ModelViewSet):
    queryset = ChildBmiCheck.objects.prefetch_related(
        'child'
        # There is no reference_z_score_data because this is related through an @property only!
    ).all()
    serializer_class = ChildBmiCheckSerializer
    permission_classes = [permissions.AllowAny]

How can I prefetch or cache this value to optimize it?


Solution

  • There are multiple ways to fetch the data in one db hit.

    But before proceeding, I would suggest you add a DateTimeField or an IntegerField (representing age) to ChildBmiCheck, since one child can have multiple height and weight records, each links to a specific timestamp.

    Although following methods still uses Child.date_of_birth attribute.

    Method 1. Annotating Subquery

    from django.db.models import Func, OuterRef, Subquery
    from django.db.models.functions import Floor, Extract
    
    subquery_ref = ReferenceZscoreData.objects.filter(age=OuterRef('age'), sex=OuterRef('child__gender'))
    
    queryset = ChildBmiCheck.objects.annotate(
        # Extract(interval, 'epoch') returns total number of seconds
        # then divide by 3600, 24, 30 returns hours, days, months respectively
        age = Floor(Extract(Func('child__date_of_birth', function='AGE'), 'epoch') / 3600 / 24 / 30)
    ).annotate(
        prop1 = Subquery(subquery_ref.values('property1ForZscore')),
        prop2 = Subquery(subquery_ref.values('property2ForZscore')),
        prop3 = Subquery(subquery_ref.values('property3ForZscore'))
    ).select_related('child')
    

    And then in your ChildBmiCheck.zscore

    @property
    def zscore(self):
        if self.child_was_present:
            return self.calculate_z_score(self.age, self.child.gender, self.prop1, self.prop2, self.prop3)
        # You probably don't need another method to calculate zscore
        # Just write the calculation steps here.
    

    Please note I use Func(field_name, function='AGE') when annotating age. This only works when your database has AGE function. PostgreSQL has it built in while SQLite3 does not. You will need to write a custom age function yourself then.

    Method 2. Materialized View

    Materialized View is a database functionality that stores a query result into a temporary table, so that on next query, it pull the data directly, without the heavy calculation.

    Again, SQLite3 does not support it. Sorry.

    Syntax of creating materialized view varies by database. Take PostgreSQL as example.

     -- 1. Create a function to calculate zscore
     CREATE FUNCTION zscore(age numeric, gender text, prop1 numeric, prop2  numeric, prop3 numeric) RETURNS numeric AS $$
         -- zscore calculation steps
     $$ LANGUAGE SQL;
    
     -- 2. Create materialized view
     CREATE MATERIALIZED VIEW myapp_childbmicheck_zscore_matview AS
     SELECT b.*, zscore(floor(extract(epoch from date_trunc('month', age(c.date_of_birth))) / 3600 / 24 / 30, c.gender, r.property1ForZscore, r.property2ForZscore, r.property3ForZscore) as zscore
     FROM myapp_childbmicheck b
     LEFT JOIN myapp_child c ON c.id = b.child_id
     LEFT JOIN myapp_referencezscore r ON r.sex = c.gender AND r.age = floor(extract(epoch from date_trunc('month', age(c.date_of_birth))) / 3600 / 24 / 30)
     WITH DATA;
    

    Then in you myapp/models.py, create corresponding Model

    class ChildBmiCheckZscore(models.Model):
        child = models.ForeignKey(Child, models.DO_NOTHING)
        height_in_meters = models.DecimalField(max_digits=40, decimal_places=10, null=True)
        weight_in_kgs = models.DecimalField(max_digits=40, decimal_places=10, null=True)
        child_was_present = models.BooleanField(default=True)
        zscore = models.DecimalField(max_digits=40, decimal_places=10, null=True)
        class Meta:
            managed = False
            db_table = 'myapp_childbmicheck_zscore_mat_view'
    

    Then in your views

     queryset = ChildBmiCheckZscore.objects.all()
    

    Please note you will need to refresh the materialized view manually, like

    REFRESH MATERIALIZED VIEW myapp_childbmicheck_zscore_matview;
    

    You need to do this NOT annually, but at least monthly, even daily. One child's zscore may change once a year, but you have multiple children with birthdays on different months.

    Method 3. Add A Zscore Field To ChildBmiCheck

    This one is quite straight forward. Calculate zscore each time a ChildBmiCheck instance is created.