Having read the django docs on order_by
there is a note/warning that (if I have understood correctly) says that:
order_by
.I tried testing this out with a basic example:
class Pizza(models.Model):
name = models.CharField(max_length=100)
toppings = models.ManyToManyField('Topping', through='PizzaToppings')
class PizzaToppings(models.Model):
pizza = models.ForeignKey('Pizza', on_delete=models.CASCADE, related_name="pizza_toppings")
topping = models.ForeignKey('Topping', on_delete=models.CASCADE, related_name="pizzas_with_topping")
amount = models.IntegerField()
class Meta:
ordering = ["amount",]
class Topping(models.Model):
ingredient = models.CharField(max_length=100)
then
>>> p1 = Pizza.objects.create(name="Cheese and Tomato")
>>> p2 = Pizza.objects.create(name="Pepperoni")
>>> cheese = Topping.objects.create(ingredient="Cheese")
>>> tomato = Topping.objects.create(ingredient="Tomato puree")
>>> p1.toppings.add(cheese, through_defaults={"amount":4})
>>> p1.toppings.add(tomato, through_defaults={"amount":3})
>>> p2.toppings.add(cheese, through_defaults={"amount":2})
>>> p2.toppings.add(tomato, through_defaults={"amount":1})
So far, so normal. But this is where things get confusing:
>>> q1 = Topping.objects.all()
<QuerySet [<Topping: Topping object (1)>, <Topping: Topping object (2)>]>
>>> q2 = p1.toppings.all()
<QuerySet [<Topping: Topping object (1)>, <Topping: Topping object (2)>]>
>>> q1.order_by("pizzas_with_topping")
<QuerySet [<Topping: Topping object (2)>, <Topping: Topping object (1)>, <Topping: Topping object (2)>, <Topping: Topping object (1)>]>
>>> q2.order_by("pizzas_with_topping")
<QuerySet [<Topping: Topping object (2)>, <Topping: Topping object (1)>]>
As can be seen above, the querysets are identical in terms of the elements that they contain. But when one q1
is ordered we get the behaviour described in the documentation. In q2
we don't get that behaviour. Presumably this is because django is doing something clever since the queryset is concerned with toppings related to p1
.
What is actually going on 'under the hood' to enforce this behaviour? The querysets are the same (if I have understood correctly), so why does order_by
behave differently for the two querysets.
The two querysets are different. The first queryset represents a query like:
-- q1
SELECT *
FROM topping
The query that is represented by q2
looks like:
-- q2
SELECT *
FROM topping
INNER JOIN pizzatoppings ON pizzatoppings.topping_id = topping.id
WHERE pizzatoppings.pizza_id = id-of-pizza
If you then perform a .order_by('pizzas_with_topping')
, then you thus make a JOIN
on the table of the PizzaToppings
model, and you thus order by the primary key of that table. For the first queryset this thus looks like:
-- q1.order_by('pizzas_with_topping')
SELECT *
FROM topping
LEFT OUTER JOIN pizzatoppings ON pizzatoppings.topping_id = topping.id
ORDER BY pizzatoppings.id
for the latter you filter on the join that already exists:
-- q2.order_by('pizzas_with_topping')
SELECT *
FROM topping
INNER JOIN pizzatoppings ON pizzatoppings.topping_id = topping.id
WHERE pizzatoppings.pizza_id = id-of-pizza
ORDER BY pizzatoppings.id
this thus means that if the same topping is used for multiple pizzas, for q1
, it will appear each time for each pizza whereas for q2
we already have filtered on the pizza, and thus retrieve each Topping
for that pizza, not the other ones.