Search code examples
djangopostgresqldjango-ormtimestamp-with-timezone

Using django's ORM to average over "timestamp without time zone" postgres field


Backstory (can skip): I have a database with records of events. Events for each label occurred around a certain period in the year. I would like to find when in the year, more or less, that group of events occurred. Therefore I planned to calculate the average timestamp per group, and do this efficiently with postgres, instead of fetching all the timestamps and calculating locally.

The question: I'm trying to average a timestamp without time zone postgres field with django's excellent ORM like so:

from django.db.models import Avg
ModelName.objects.filter(a_field='some value').aggregate(Avg('time'))

However I'm getting:

function avg(timestamp without time zone) does not exist
LINE 1: SELECT AVG("model_name"."time") AS "time__avg" FROM "m...
           ^
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.

Is there a way to do this with django's ORM?
If not, how then do I workaround?


Solution

  • I had a similar problem, where I wanted to find the average time taken to vote for a particular item. But postgres wouldn't allow taking the average of datetimes. Doing so would result in the following error:

    django.db.utils.ProgrammingError: function avg(timestamp with time zone) does not exist
    LINE 1: SELECT "votes_item"."name", AVG("votes_vote"."datetime") AS ...
                                        ^
    HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
    

    To make this simpler consider the following tables, where the Vote has foreign key relation to Item:

    The Item table:

    id: pK name: char datetime(time at which the item was inserted)
    1 Apple 22-06-23 11:25:33
    2 Orange 22-06-22 01:22:18

    The Vote table:

    id: pK user: Fk (user who voted for the item) item: Fk (the item the user voted) vote: (1 for positive vote and -1 negative vote) datetime (time at which the item was voted)
    1 1 1 1 2022-06-22 11:26:18
    2 3 1 1 2022-06-21 12:26:36
    3 2 1 1 2022-06-26 01:20:59

    I wanted to know the average time at which users voted for each item. For eg: all the avg time taken by users to vote Apple (i.e annotate)

    Since postgres avg function doesn't directly take in datetime, first convert it to seconds then take the average and convert it back to datetime.

    To make things simpler create two class as shown below.

    from django.db import models
    
    class Epoch(models.expressions.Func):
    
        template = 'EXTRACT(epoch FROM %(expressions)s)::FLOAT'
        output_field = models.FloatField()
    
    
    class DateTimeFromFloat(models.expressions.Func):
    
        template = 'To_TIMESTAMP(%(expressions)s)::TIMESTAMP at time zone \'UTC\''
        output_field = models.DateTimeField()
    

    read more about Func in this excellent answer

    Now I wanted to get the average time at which each item was voted positive

    So I use

    Item.objects.filter(vote__vote=1).annotate(avg_time=DateTimeFromFloat(Avg(Epoch('vote__datetime')))).values('avg_time', 'name')
    

    important part:

    annotate(avg_time=DateTimeFromFloat(Avg(Epoch('vote__datetime'))))
    

    output

    <QuerySet [{'name': 'Apple', 'avg_time': datetime.datetime(2022, 6, 23, 8, 24, 37, 666667, tzinfo=datetime.timezone.utc)}]>
    

    You can perform a similar operation using aggregate.