Imagine we have the Django ORM model Meetup
with the following definition:
class Meetup(models.Model):
language = models.CharField()
speaker = models.CharField()
date = models.DateField(auto_now=True)
I'd like to use a single query to fetch the language, speaker and date for the latest event for each language.
>>> Meetup.objects.create(language='python', speaker='mike')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='python', speaker='ryan')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='node', speaker='noah')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='node', speaker='shawn')
<Meetup: Meetup object>
>>> Meetup.objects.values("language").annotate(latest_date=models.Max("date")).values("language", "speaker", "latest_date")
[
{'speaker': u'mike', 'language': u'python', 'latest_date': ...},
{'speaker': u'ryan', 'language': u'python', 'latest_date': ...},
{'speaker': u'noah', 'language': u'node', 'latest_date': ...},
{'speaker': u'shawn', 'language': u'node', 'latest_date': ...},
]
D'oh! We're getting the latest event, but for the wrong grouping!
It seems like I need a way to GROUP BY
the language
but SELECT
on a different
set of fields?
Update - this sort of query seems fairly easy to express in SQL:
SELECT language, speaker, MAX(date)
FROM app_meetup
GROUP BY language;
I'd love a way to do this without using Django's raw()
- is it possible?
Update 2 - after much searching, it seems there are similar questions on SO:
Update 3 - in the end, with @danihp's help, it seems the best you can do is two queries. I've used the following approach:
# Abuse the fact that the latest Meetup always has a higher PK to build
# a ValuesList of the latest Meetups grouped by "language".
latest_meetup_pks = (Meetup.objects.values("language")
.annotate(latest_pk=Max("pk"))
.values_list("latest_pk", flat=True))
# Use a second query to grab those latest Meetups!
Meetup.objects.filter(pk__in=latest_meetup_pks)
This question is a follow up to my previous question:
This is the kind of queries that are easy to explain but hard to write. If this be SQL I will suggest to you a CTE filtered query with row rank over partition by language ordered by date ( desc )
But this is not SQL, this is django query api. Easy way is to do a query for each language:
languages = Meetup.objects.values("language", flat = True).distinct.order_by()
last_by_language = [ Meetup
.objects
.filter( language = l )
.latest( 'date' )
for l in languages
]
This crash if some language don't has meetings. The other approach is to get all max data for each language:
last_dates = ( Meetup
.objects
.values("language")
.annotate(ldate=models.Max("date"))
.order_by() )
q= reduce(lambda q,meetup:
q | ( Q( language = meetup["language"] ) & Q( date = meetup["ldate"] ) ),
last_dates, Q())
your_query = Meetup.objects.filter(q)
Perhaps someone can explain how to do it in a single query without raw sql.
Edited due OP comment
You are looking for:
"SELECT language, speaker, MAX(date) FROM app_meetup GROUP BY language"
Not all rdbms supports this expression, because all fields that are not enclosed into aggregated functions on select clause should appear on group by clause. In your case, speaker
is on select clause (without aggregated function) but not appear in group by.
In mysql they are not guaranties than showed result speaker
was that match with max date. Because this, we are not facing a easy query.
Quoting MySQL docs:
In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause...However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group.
The most close query to match your requirements is:
Reults = ( Meetup
.objects
.values("language","speaker")
.annotate(ldate=models.Max("date"))
.order_by() )