Search code examples
djangopostgresqljsonb

Understanding Django JSONField key-path queries and exhaustive sets


While looking at the Django docs on querying JSONField, I came upon a note stating:

Due to the way in which key-path queries work, exclude() and filter() are not guaranteed to produce exhaustive sets. If you want to include objects that do not have the path, add the isnull lookup.

Can someone give me an example of a query that would not produce an exhaustive set? I'm having a pretty hard time coming up with one.


Solution

  • This is the ticket that resulted in the documentation that you quoted: https://code.djangoproject.com/ticket/31894

    TL;DR: To get the inverse of .filter() on a JSON key path, it is not sufficient to only use .exclude() with the same clause since it will only give you records where the JSON key path is present but has a different value and not records where the JSON key path is not present at all. That's why it says:

    If you want to include objects that do not have the path, add the isnull lookup.


    If I may quote the ticket description here:

    Filtering based on a JSONField key-value pair seems to have some unexpected behavior when involving a key that not all records have. Strangely, filtering on an optional property key will not return the inverse result set that an exclude on the same property key will return.

    In my database, I have:

    • 2250 total records 49 records where jsonfieldname = {'propertykey': 'PropertyValue'}
    • 296 records where jsonfieldname has a 'propertykey' key with some other value
    • 1905 records where jsonfieldname does not have a 'propertykey' key at all

    The following code:

    q = Q(jsonfieldname__propertykey="PropertyValue")
    
    total_records = Record.objects.count()
    filtered_records = Record.objects.filter(q).count()
    excluded_records = Record.objects.exclude(q).count()
    filtered_plus_excluded_records = filtered_records + excluded_records
    
    print('Total: %d' % total_records)
    print('Filtered: %d' % filtered_records)
    print('Excluded: %d' % excluded_records)
    print('Filtered Plus Excluded: %d' % filtered_plus_excluded_records)
    

    Will output this:

    Total: 2250
    Filtered: 49
    Excluded: 296
    Filtered Plus Excluded: 345
    

    It is surprising that the filtered+excluded value is not equal to the total record count. It's surprising that the union of a expression plus its inverse does not equal the sum of all records. I am not aware of any other queries in Django that would return a result like this. I realize adding a check that the key exists would return a more expected results, but that doesn't stop the above from being surprising.

    I'm not sure what a solution would be - either a note in the documentation that this behavior should be expected, or take a look at how this same expression is applied for both the exclude() and filter() queries and see why they are not opposites.