I have a model structure similar to this:
class DataFile(models.Model):
user = models.ForeignKey('data.User', verbose_name='Data View')
period = models.ForeignKey('data.Period', verbose_name='Period')
created = models.DateTimeField(auto_now_add=True,null=True)
I have a list of different tuples of users ids and periods ids, like
{(272, 42), (78, 39), (280, 9), (73, 35), (81, 35), (62, 31), ...}
and I need to retrieve the most recent model for each of these couples of values.
I couldn't come up with anything better than this:
data = []
my_data_list = DataFile.objects.all().values_list('user', "period").distinct()
for user, period in my_data_list:
x = DataFile.objects.filter(user_id=user, period_id=period).order_by('created').last()
data.append(x)
But is scaling very bad even for a little thousands elements in the database.
I am sure there is a more efficent way to retrieve this data, if you know one please provide it with some explanation
You can retrieve the DataFile
records by filtering out DataFile
s where there is a more recent record with an Exists
subquery [Django-doc], so:
from django.db.models import Exists, OuterRef
DataFile.objects.filter(
~Exists(
DataFile.objects.filter(
user_id=OuterRef('user_id'),
period_id=OuterRef('period_id'),
created__gt=OuterRef('created'),
)
),
)
For postgresql, you can however do this more effectively:
DataFile.objects.order_by('user', 'period', '-created').distinct('user', 'period')
this will thus only retrieve items with a distinct user and period, and since we ordered by created
in descending order, we will get the most recent one.