When a queryset is sorted by a field in a related model performance decreases drastically. I use mysql.
For example, I have two models:
class Event(models.Model):
idEvent = models.BigAutoField(primary_key=True)
created_at = models.DateTimeField(db_index=True, verbose_name=_('date'))
processed_at = models.DateTimeField(auto_now_add=True, verbose_name=_('processed'))
data = models.TextField()
class Meta:
ordering = ["-created_at"]
# [1154519 rows]
class AccessHistory(models.Model):
event = models.ForeignKey(Event, on_delete=models.CASCADE, blank=True, null=True)
result = models.TextField(verbose_name=_('result'))
# [1130603 rows]
If I do AccessHistory.objects.all().select_related('event').order_by('-event__created_at')
the query delays over 5s, if I swap select_related with prefetch_related I get the same delay. When I do the query without ordering it responds in the expected time (<1s)
-- AccessHistory.objects.all().select_related('event').order_by('-event__created_at')
SELECT `history_accesshistory`.`id`,
`history_accesshistory`.`event_id`,
`history_accesshistory`.`result`,
`history_event`.`idEvent`,
`history_event`.`created_at`,
`history_event`.`processed_at`,
`history_event`.`data`
FROM `history_accesshistory` LEFT OUTER JOIN `history_event` ON (`history_accesshistory`.`event_id` = `history_event`.`idEvent`)
ORDER BY `history_event`.`created_at` DESC
-- AccessHistory.objects.all().prefetch_related('event').order_by('-event__created_at')
SELECT `history_accesshistory`.`id`,
`history_accesshistory`.`event_id`,
`history_accesshistory`.`result`
FROM `history_accesshistory` LEFT OUTER JOIN `history_event` ON (`history_accesshistory`.`event_id` = `history_event`.`idEvent`)
ORDER BY `history_event`.`created_at` DESC
I tried with select_related, prefetch_related, indexing created_at field in Event model and setting a default order in Event model. Nothing of this improves the response time.
How I can optimize this without moving/copying created_at field to AccessHistory model?
EDIT 1: I use this queryset in django-tables2 view. This is why there is no limit.
EDIT 2:
@Rick James thanks for the answer. Now I understand the problem but I don't found how to solve it. The relation is not one to one, every AccessHistory has an Event but there are Events without AccessHistory. With INNER JOIN or JOIN it works as expected but I don't know how to do the query with django ORM.
I tried with AccessHistory.objects.select_related('event')
but it uses a LEFT OUTER JOIN:
SELECT `history_accesshistory`.`id`,
`history_accesshistory`.`event_id`,
`history_accesshistory`.`result`,
`history_event`.`idEvent`,
`history_event`.`created_at`,
`history_event`.`processed_at`,
`history_event`.`data`,
FROM `history_accesshistory` LEFT OUTER JOIN `history_event` ON (`history_accesshistory`.`event_id` = `history_event`.`idEvent`)
EDIT: @Paul Spiegel LEFT JOIN is not required but I don't know how to avoid it using QuerySet API.
Thanks to @Rick James and @Paul Spiegel for pointing me in the right direction. The problem was the blank=True
and Null=True
in event field.
Now with
class AccessHistory(models.Model):
event = models.ForeignKey(Event, on_delete=models.CASCADE, blank=False, null=False)
result = models.TextField(verbose_name=_('result'))
It works as expected:
-- AccessHistory.objects.all().select_related('event').order_by('-event__created_at')
SELECT `history_accesshistory`.`id`,
`history_accesshistory`.`event_id`,
`history_accesshistory`.`result`,
`history_event`.`idEvent`,
`history_event`.`created_at`,
`history_event`.`processed_at`,
`history_event`.`data`
FROM `history_accesshistory` INNER JOIN `history_event` ON (`history_accesshistory`.`event_id` = `history_event`.`idEvent`)
ORDER BY `history_event`.`created_at` DESC