I'm building a job board. Each Job could have several associated Location objects.
I have designed my Location and Job models as follows:
class Location(BaseModel):
slug = models.CharField(unique=True)
city = models.OneToOneField(to="City", null=True)
state = models.ForeignKey(to="State", null=True)
country = models.ForeignKey(to="Country")
class Job(BaseModel):
title = models.CharField()
description = models.TextField()
locations = models.ManyToManyField(
to="Location",
related_name="jobs",
through="JobLocation",
)
So, a Location object has the flexibility to refer to our idea of a country (like United States), a state (like New York) or a city (like Manhattan). I populate the slug field of Location model like so:
If Location object is a country, I use the country name.
united-states
If Location object is a state, I use the (state name + country name).
new-york-united-states
If Location object is a city, I use (city name + state name + country name).
manhattan-new-york-united-states
With slug field populated in this manner, I can simplify querying all the jobs in a particular location using the endswith
lookup. For example, if there's a Python job in Manhattan, NY and a React job in Brooklyn, NY, I can get all the jobs in the state of New York like so:
Job.objects.filter(locations__slug__endswith="new-york-united-states").distinct()
Now, I would like to get a list of all my Location objects, and have each Location item annotated with number_of_jobs
. This number_of_jobs
should be the count of all jobs within that particular Location.
For example:
number_of_jobs=1
(The Python job)number_of_jobs=1
(The React job)number_of_jobs=2
(Python job + React job)Attempt 1:
The simplest solution, which is to Count
the Location.jobs
doesn't work. It shows 1 job for Manhattan and 1 job for Brooklyn, but 0 jobs for New York.
Location.objects.annotate(number_of_jobs=Count("jobs", distinct=True))
# 1 job for Manhattan
# 1 job for Brooklyn
# 0 job for New York
Attempt 2:
subquery = Job.objects.filter(locations__slug__endswith=OuterRef("slug")).distinct().count()
Location.objects.annotate(number_of_jobs=Subquery(subquery))
This also doesn't work, because count()
executes the queryset immediately.
Attempt 3:
Based on Yuvraj's response, I managed to get absolutely close to the answer, but not quite.
jobs = Job.objects.filter(locations__slug__endswith=OuterRef("slug"))
subquery = jobs.annotate(job_count=Count("id", distinct=True)).values("job_count")[:1]
Location.objects.annotate(number_of_jobs=Subquery(subquery))
If I run these expressions using debugsqlshell
of django-debug-toolbar
, and I copy the SQL that Django creates, and remove the GROUP BY clause, I get my answer 100% correctly. I don't yet understand why Django adds GROUP BY and why removing it gives the correct answer.
I'm at my wits' end with this queryset. I'd appreciate any help.
First off, thank you Yuvraj for your response. This answer builds on top of yours.
Turns out, Django automatically adds a GROUP BY clause when you use the Count
aggregate function. I don't fully understand why it adds the GROUP BY clause, and why removing it returns the correct results in my case, so I won't speculate. But here's the answer, also based on Sergei's comment on this answer:
class NonAggregateCount(Count):
# Gets rid of Django's automatic GROUP BY clause
contains_aggregate = False
jobs = Job.objects.filter(locations__slug__endswith=OuterRef("slug"))
subquery = jobs.annotate(job_count=NonAggregateCount("id", distinct=True)).values("job_count")
Location.objects.annotate(number_of_jobs=Subquery(subquery))