Search code examples
djangomany-to-manymanytomanyfield

Understanding order_by of multi-valued fields (Django)


Having read the django docs on order_by there is a note/warning that (if I have understood correctly) says that:

  • If you are ordering a queryset using a multi-valued field, then every element in that queryset that has multiple related items, will be added multiple times to the resulting queryset created by order_by.

I tried testing this out with a basic example:

Minimal Reproducible Example

class Pizza(models.Model):
    name = models.CharField(max_length=100)
    toppings = models.ManyToManyField('Topping', through='PizzaToppings')

class PizzaToppings(models.Model):
    pizza = models.ForeignKey('Pizza', on_delete=models.CASCADE, related_name="pizza_toppings")
    topping = models.ForeignKey('Topping', on_delete=models.CASCADE, related_name="pizzas_with_topping")
    amount = models.IntegerField()

    class Meta:
        ordering = ["amount",]

class Topping(models.Model):
    ingredient = models.CharField(max_length=100)

then

>>> p1 = Pizza.objects.create(name="Cheese and Tomato")
>>> p2 = Pizza.objects.create(name="Pepperoni")
>>> cheese = Topping.objects.create(ingredient="Cheese")
>>> tomato = Topping.objects.create(ingredient="Tomato puree")
>>> p1.toppings.add(cheese, through_defaults={"amount":4})
>>> p1.toppings.add(tomato, through_defaults={"amount":3})
>>> p2.toppings.add(cheese, through_defaults={"amount":2})
>>> p2.toppings.add(tomato, through_defaults={"amount":1})

So far, so normal. But this is where things get confusing:

>>> q1 = Topping.objects.all()
<QuerySet [<Topping: Topping object (1)>, <Topping: Topping object (2)>]>
>>> q2 = p1.toppings.all()
<QuerySet [<Topping: Topping object (1)>, <Topping: Topping object (2)>]>
>>> q1.order_by("pizzas_with_topping")
<QuerySet [<Topping: Topping object (2)>, <Topping: Topping object (1)>, <Topping: Topping object (2)>, <Topping: Topping object (1)>]>
>>> q2.order_by("pizzas_with_topping")
<QuerySet [<Topping: Topping object (2)>, <Topping: Topping object (1)>]>

The problem

As can be seen above, the querysets are identical in terms of the elements that they contain. But when one q1 is ordered we get the behaviour described in the documentation. In q2 we don't get that behaviour. Presumably this is because django is doing something clever since the queryset is concerned with toppings related to p1.

The question(s)

What is actually going on 'under the hood' to enforce this behaviour? The querysets are the same (if I have understood correctly), so why does order_by behave differently for the two querysets.


Solution

  • The two querysets are different. The first queryset represents a query like:

    -- q1
    SELECT *
    FROM topping
    

    The query that is represented by q2 looks like:

    -- q2
    SELECT *
    FROM topping
    INNER JOIN pizzatoppings ON pizzatoppings.topping_id = topping.id
    WHERE pizzatoppings.pizza_id = id-of-pizza

    If you then perform a .order_by('pizzas_with_topping'), then you thus make a JOIN on the table of the PizzaToppings model, and you thus order by the primary key of that table. For the first queryset this thus looks like:

    -- q1.order_by('pizzas_with_topping')
    SELECT *
    FROM topping
    LEFT OUTER JOIN pizzatoppings ON pizzatoppings.topping_id = topping.id
    ORDER BY pizzatoppings.id

    for the latter you filter on the join that already exists:

    -- q2.order_by('pizzas_with_topping')
    SELECT *
    FROM topping
    INNER JOIN pizzatoppings ON pizzatoppings.topping_id = topping.id
    WHERE pizzatoppings.pizza_id = id-of-pizza
    ORDER BY pizzatoppings.id

    this thus means that if the same topping is used for multiple pizzas, for q1, it will appear each time for each pizza whereas for q2 we already have filtered on the pizza, and thus retrieve each Topping for that pizza, not the other ones.