Search code examples
djangodjango-modelsdjango-1.10

Weird behavior in Django queryset union of values


I want to join the sum of related values from users with the users that do not have those values.

Here's a simplified version of my model structure:

class Answer(models.Model):
    person = models.ForeignKey(Person)
    points = models.PositiveIntegerField(default=100)
    correct = models.BooleanField(default=False)

class Person(models.Model):
    # irrelevant model fields

Sample dataset:

Person | Answer.Points
------ | ------
3      | 50
3      | 100
2      | 100
2      | 90

Person 4 has no answers and therefore, points

With the query below, I can achieve the sum of points for each person:

people_with_points = Person.objects.\
        filter(answer__correct=True).\
        annotate(points=Sum('answer__points')).\
        values('pk', 'points')

<QuerySet [{'pk': 2, 'points': 190}, {'pk': 3, 'points': 150}]>

But, since some people might not have any related Answer entries, they will have 0 points and with the query below I use Coalesce to "fake" their points, like so:

people_without_points = Person.objects.\
        exclude(pk__in=people_with_points.values_list('pk')).\
        annotate(points=Coalesce(Sum('answer__points'), 0)).\
        values('pk', 'points')

<QuerySet [{'pk': 4, 'points': 0}]>

Both of these work as intended but I want to have them in the same queryset so I use the union operator | to join them:

everyone = people_with_points | people_without_points

Now, for the problem:

After this, the people without points have their points value turned into None instead of 0.

<QuerySet [{'pk': 2, 'points': 190}, {'pk': 3, 'points': 150}, {'pk': 4, 'points': None}]>

Anyone has any idea of why this happens?

Thanks!


Solution

  • I should mention that I can fix that by annotating the queryset again and coalescing the null values to 0, like this:

    everyone.\
        annotate(real_points=Concat(Coalesce(F('points'), 0), Value(''))).\
        values('pk', 'real_points')
    
    <QuerySet [{'pk': 2, 'real_points': 190}, {'pk': 3, 'real_points': 150}, {'pk': 4, 'real_points': 0}]>
    

    But I wish to understand why the union does not work as I expected in my original question.

    EDIT: I think I got it. A friend instructed me to use django-debug-toolbar to check my SQL queries to investigate further on this situation and I found out the following:

    Since it's a union of two queries, the second query annotation is somehow not considered and the COALESCE to 0 is not used. By moving that to the first query it is propagated to the second query and I could achieve the expected result.

    Basically, I changed the following:

    # Moved the "Coalesce" to the initial query
    people_with_points = Person.objects.\
        filter(answer__correct=True).\
        annotate(points=Coalesce(Sum('answer__points'), 0)).\
        values('pk', 'points')
    
    # Second query does not have it anymore
    people_without_points = Person.objects.\
        exclude(pk__in=people_with_points.values_list('pk')).\
        values('pk', 'points')
    
    # We will have the values with 0 here!
    everyone = people_with_points | people_without_points