Search code examples
djangodjango-ormdjango-jsonfield

Ordering Django queryset using attributes of a JSONField, provided the attribute is not present everywhere


This is a very complex question, so let me explain. I have a model called Person which stores most of its data in a JSONField.

class Person(models.Model):
    data = JSONField()

Now, the data field is usually in the the following format:

{"name" : <String>, "age" : <int>}

Now, what I want to do is create a queryset of Person, that orders the objects using the age attribute from its data field, in descending order. This is solved using the following code:

from django.db.models.expressions import RawSQL
from .models import Person

qs = Person.objects.annotate(age=RawSQL("(data->>'age')::int", [])).order_by('-age')

This is great, and works well. However, during testing, I changed the data attribute of one Person object to something like this:

{"name" : <String>, "profession" : <String>} 

That is, this object does not have the age attribute, within its data field. Now when I run the query above, it still works fine, but this object (the one without age attribute), is at the very top. This is because of one of two reasons:

  • Since its age is null, its sent to the top because of the descending order_by function.
  • Its the object I last created, so it was always at the beginning, but because it does not have the age attribute, its simply not being affected by the order_by function, and it stays in its original position.

What I actually want to do is send all the objects that don't have the age attribute in their data field to the very end of the queryset.

I tried the union method by creating 2 querysets(one where age is not null, and one where it is) and joined them using | operator. This did not work because the ordering got messed up. I also tried this weird method I found in another question (which also did not work):

qs = Person.objects.annotate(age=RawSQL("(data->>'age')::int", [])).extra(select={'is_top': "age__isnull=True"})
qs = qs.extra(order_by('-is_top')

Link to the weird solution that did not work

Anyway, is there any way to do this that does not involve lists, itertools, and chains? Because I have heard they can be pretty slow at times.

Thanks!

Note: Please no answers about normalizing the database for these queries instead of using JSONFields. I am well aware on the benefits of normalization, but for my use case, it has to be a JSONField.


Solution

  • In case the key lookup fails, the result is NULL as specified in the PostgreSQL documentation:

    Note: There are parallel variants of these operators for both the json and jsonb types. The field/element/path extraction operators return the same type as their left-hand input (either json or jsonb), except for those specified as returning text, which coerce the value to text. The field/element/path extraction operators return NULL, rather than failing, if the JSON input does not have the right structure to match the request; for example if no such element exists.

    You can do that, you simply order with .desc(nulls_last=True) [Django-doc]:

    from django.db.models import F, RawSQL
    from .models import Person
    
    qs = Person.objects.annotate(
        age=RawSQL("(data->>'age')::int", [])
    ).order_by(F('age').desc(nulls_last=True))

    This will order the elements by:

    -- SQL query
    ORDER BY age IS NULL, age DESC

    So by first ordering by age IS NULL, this will result in TRUE, and as a result, it is ordered at the bottom of the result table.