I have a model like this (simplified):
class Events(models.Model):
reference_date = models.DateField()
event = models.IntegerField(db_index=True)
created_at = models.DateTimeField()
updated_at = models.DateTimeField()
class Meta:
unique_together = (('reference_date', 'event'),)
I can have the same event
with multiple reference dates, but I only have one event
per reference_date
. The table data looks something like this:
id | reference_date | event | created_at | updated_at |
---|---|---|---|---|
1 | 2022-01-01 | 12345 | 2022-03-05 18:18:03 | 2022-03-06 18:12:09 |
2 | 2022-01-02 | 12345 | 2022-03-08 08:05:11 | 2022-03-08 08:05:55 |
3 | 2022-01-08 | 12345 | 2022-06-15 18:18:12 | 2022-06-16 02:23:11 |
4 | 2022-01-01 | 98765 | 2022-01-11 07:55:25 | 2022-01-13 08:45:12 |
5 | 2022-01-02 | 98765 | 2022-06-22 10:25:08 | 2022-07-05 18:55:08 |
6 | 2022-01-09 | 45678 | 2022-02-19 12:55:07 | 2022-04-16 12:21:05 |
7 | 2022-01-10 | 45678 | 2022-03-05 11:23:45 | 2022-03-05 18:55:03 |
I need the latest record for each event
. But I need all the event
attributes, not just the max(reference_date)
I'm looking for this result:
[
{'id': 3, 'event': 12345, 'reference_date': '2022-01-08', 'created_at': '2022-06-15 18:18:12', 'updated_at': '2022-06-16 02:23:11'},
{'id': 5, 'event': 98765, 'reference_date': '2022-01-02', 'created_at': '2022-06-22 10:25:08', 'updated_at': '2022-07-05 18:55:08'},
{'id': 7, 'event': 45678, 'reference_date': '2022-01-10', 'created_at': '2022-03-05 11:23:45', 'updated_at': '2022-03-05 18:55:03'}
]
From a 'sql-perspective' I could get the results in multiple ways: SUBQUERIES, ROW_NUMBER, CORRELATED SUBQUERY etc. In this particular case for clarity reasons I prefer to use a join with itself using an aggregate inside the subquery.
If I'd write a raw query, I'd would do like this:
SELECT
e.*
FROM events as e
INNER JOIN (
SELECT
event,
max(reference_date) as reference_date
FROM events
GROUP BY event) AS b
ON b.reference_date = e.reference_date AND b.event = e.event
I think this query is very straightforward and has a good performance for the data volume in this table (hundreds of thousands of records)
I looked for several ways to build this type of query (Subquery, PrefetchRelated etc) but I couldn't find any proper way to translate this query into Django ORM syntax.
Is there a way using the 'django-orm way' without a convoluted ORM solution with the same reasonable performance?
ps: I need this query (or similar ones) for different database engines (PostgreSql, MySQL, MSQLServer ..)
After some time with another projects, learning and digging through the Django documentation and other StackOverflow threads, I've discovered a method to retrieve the latest record using the QuerySet API:
from django.db.models import OuterRef, Subquery, Max, F
# Creating the subquery with the last reference date by event
last_event_subquery = (
Events
.objects
.values('event')
.annotate(last_reference_date = Max('reference_date'))
.filter(event=OuterRef('event'))
)
# Query to fetch the latest events
latest_events = (
Events
.objects
.annotate(
last_reference_date=Subquery(last_event_subquery.values('last_reference_date'))
)
.filter(reference_date=F('last_reference_date'))
)
This approach employs the Max
function in conjunction with Subquery
and OuterRef
to retrieve the most recent record for each event.
It then utilizes F()
expressions to filter and fetch the latest events based on their reference date.
With this queryset I can retrieve all Event
instance attributes the way I originally wanted.