Search code examples
pythondjangopostgresqlperformancequery-optimization

Django Queryset for most recent object with unique couple of values


I have a model structure similar to this:

class DataFile(models.Model):
    user = models.ForeignKey('data.User', verbose_name='Data View')
    period = models.ForeignKey('data.Period', verbose_name='Period')
    created = models.DateTimeField(auto_now_add=True,null=True)

I have a list of different tuples of users ids and periods ids, like {(272, 42), (78, 39), (280, 9), (73, 35), (81, 35), (62, 31), ...} and I need to retrieve the most recent model for each of these couples of values. I couldn't come up with anything better than this:

data = []
my_data_list = DataFile.objects.all().values_list('user', "period").distinct()
for user, period in my_data_list:
    x = DataFile.objects.filter(user_id=user, period_id=period).order_by('created').last()
    data.append(x)

But is scaling very bad even for a little thousands elements in the database.

I am sure there is a more efficent way to retrieve this data, if you know one please provide it with some explanation


Solution

  • You can retrieve the DataFile records by filtering out DataFiles where there is a more recent record with an Exists subquery [Django-doc], so:

    from django.db.models import Exists, OuterRef
    
    DataFile.objects.filter(
        ~Exists(
            DataFile.objects.filter(
                user_id=OuterRef('user_id'),
                period_id=OuterRef('period_id'),
                created__gt=OuterRef('created'),
            )
        ),
    )

    For , you can however do this more effectively:

    DataFile.objects.order_by('user', 'period', '-created').distinct('user', 'period')

    this will thus only retrieve items with a distinct user and period, and since we ordered by created in descending order, we will get the most recent one.