Search code examples
djangodjango-modelsdjango-querysetdjango-filters

Django diffrenet results from AND operation and sequential filter


If I have models like this:

class Person(models.Model):
    first_name = models.CharField(max_length=20)
    last_name = models.CharField(max_length=20)

class PersonSession(models.Model):
    start_time = models.DateTimeField(auto_now_add=True)
    end_time = models.DateTimeField(null=True,
                                    blank=True)
    person = models.ForeignKey(Person, related_name='sessions')

class Billing(models.Model):
    DEBT = 'DE'
    BALANCED = 'BA'
    CREDIT = 'CR'

    session = models.OneToOneField(PersonSession,
                                   blank=False,
                                   null=False,
                                   related_name='billing')
    STATUS = ((BALANCED, 'Balanced'),
              (DEBT, 'Debt'),
              (CREDIT, 'Credit'))

    status = models.CharField(max_length=2,
                              choices=STATUS,
                              blank=False,
                              default=BALANCED
                              )

The two below query generates different results:

Person.objects.filter(Q(sessions__start_time__gte='2000-02-01') & \
                      Q(sessions__start_time__lte='2000-03-01') & \
                      Q(sessions__billing__status=Billing.DEBT))
OR
Person.objects.filter(Q(sessions__start_time__gte='2000-02-01') & \
                      Q(sessions__start_time__lte='2000-03-01')).filter(
                      Q(sessions__billing__status=Billing.DEBT))

The first one generates two person 1,2 and the second one generates all three persons that I have, the data is as below:

id | first_name | last_name | id |        start_time         |         end_time          | person_id | id | status | session_id 
---+------------+-----------+----+---------------------------+---------------------------+-----------+----+--------+------------
0 | person     | 0         |  0 | 2000-01-01 16:32:00+03:30 | 2000-01-01 17:32:00+03:30 |         0 |  0 | DE     |          0
0 | person     | 0         |  1 | 2000-02-01 16:32:00+03:30 | 2000-02-01 17:32:00+03:30 |         0 |  1 | BA     |          1
0 | person     | 0         |  2 | 2000-03-01 16:32:00+03:30 | 2000-03-01 17:32:00+03:30 |         0 |  2 | DE     |          2
1 | person     | 1         |  3 | 2000-01-01 16:32:00+03:30 | 2000-01-01 17:32:00+03:30 |         1 |  3 | BA     |          3
1 | person     | 1         |  4 | 2000-02-01 16:32:00+03:30 | 2000-02-01 17:32:00+03:30 |         1 |  4 | DE     |          4
1 | person     | 1         |  5 | 2000-03-01 16:32:00+03:30 | 2000-03-01 17:32:00+03:30 |         1 |  5 | DE     |          5
2 | person     | 2         |  6 | 2000-01-01 16:32:00+03:30 | 2000-01-01 17:32:00+03:30 |         2 |  6 | DE     |          6
2 | person     | 2         |  7 | 2000-02-01 16:32:00+03:30 | 2000-02-01 17:32:00+03:30 |         2 |  7 | DE     |          7
2 | person     | 2         |  8 | 2000-03-01 16:32:00+03:30 | 2000-03-01 17:32:00+03:30 |         2 |  8 | BA     |          8

I wrote an example that creates class and data for them to show the difference.

Please explain what is difference between these two queries and as a side question if in django-filter package I want to define some fields to work as a first filter how could I do this?


Solution

  • The queries generated by the filters are different as follows:

    >>> print Person.objects.filter(Q(sessions__start_time__gte='2000-02-01') & \
    ...                       Q(sessions__start_time__lte='2000-03-01')).filter(
    ...                       Q(sessions__billing__status=Billing.DEBT)).query
    SELECT "test_filter_person"."id", "test_filter_person"."first_name", "test_filter_person"."last_name" FROM "test_filter_person" INNER JOIN "test_filter_personsession" ON ("test_filter_person"."id" = "test_filter_personsession"."person_id") INNER JOIN "test_filter_personsession" T3 ON ("test_filter_person"."id" = T3."person_id") INNER JOIN "test_filter_billing" ON (T3."id" = "test_filter_billing"."session_id") WHERE ("test_filter_personsession"."start_time" >= 2000-02-01 00:00:00+00:00 AND "test_filter_personsession"."start_time" <= 2000-03-01 00:00:00+00:00 AND "test_filter_billing"."status" = DE)
    >>> print Person.objects.filter(Q(sessions__start_time__gte='2000-02-01') & \
    ...                       Q(sessions__start_time__lte='2000-03-01') & \
    ...                       Q(sessions__billing__status=Billing.DEBT)).query
    SELECT "test_filter_person"."id", "test_filter_person"."first_name", "test_filter_person"."last_name" FROM "test_filter_person" INNER JOIN "test_filter_personsession" ON ("test_filter_person"."id" = "test_filter_personsession"."person_id") INNER JOIN "test_filter_billing" ON ("test_filter_personsession"."id" = "test_filter_billing"."session_id") WHERE ("test_filter_personsession"."start_time" >= 2000-02-01 00:00:00+00:00 AND "test_filter_personsession"."start_time" <= 2000-03-01 00:00:00+00:00 AND "test_filter_billing"."status" = DE)
    

    Let's call them Q1 and Q2 respectively.

    As you can see, Q1 does the join between Person, PersonSession, and Billing once, and all the conditions are applied together in the WHERE clause.

    However, in Q2, since you terminate the first filter with the first two checks, this results in a first query join between Person and PersonSession first (for Person.objects.filter(Q(sessions__start_time__gte='2000-02-01') & Q(sessions__start_time__lte='2000-03-01')), the result this first join is again joined with PersonSession and Billing in order to execute the second part of the filter query .filter(Q(sessions__billing__status=Billing.DEBT)).

    You can read more here: difference between filter with multiple arguments and chain filter in django

    Also want to draw your attention to the last line of the post:

    One table: But if the query doesn't involve joined tables like the example from Yuji and DTing. The result is same.