Search code examples
pythonsqldjangopostgresqlgenerate-series

Filtering a query that uses generate_series while keeping null rows


I'm performing a raw query in Django in order to make use of PostgreSQL's generate_series function and get rows back for each date in an interval, comparing the generated dates with datetime ranges (order_dates) in the BaseEntry model (something not really possible directly in the ORM).

I'm using an unmanaged model:

class OrderBinnedStat(models.Model):
    bin = models.DateTimeField()
    id_count = models.IntegerField()
    avg_flow = models.IntegerField()
    sum_flow = models.IntegerField()

    class Meta:
        managed = False

And running the following code to query and view the result:

from django.utils import timezone


timezone.activate("UTC")

end_dt = timezone.datetime(year=2021, month=12, day=26, hour=0, minute=0, second=0)
end_dt = timezone.make_aware(end_dt)

start_dt = end_dt - timezone.timedelta(days=1)
minutes = 60

report = OrderBinnedStat.objects.raw("""
SELECT row_number() OVER () AS id,
       dt_range.bin,
       count(DISTINCT t.id) AS id_count,
       avg(DISTINCT t.flow) AS avg_flow,
       sum(DISTINCT t.flow) AS sum_flow
FROM (
    SELECT generate_series(%s, %s - interval '1 milliseconds', interval '%s min')
    FROM public.orders_baseentry t -- is this line needed?
    ) dt_range(bin)
LEFT JOIN public.orders_baseentry t ON t.order_dates @> dt_range.bin
-- WHERE t.status = ANY(ARRAY['new', 'cancelled'])
GROUP  BY dt_range.bin
ORDER  BY dt_range.bin;
""", [start_dt, end_dt, minutes])

for item in report:
    print(item.id, item.bin, item.id_count, item.avg_flow, item.sum_flow)

It works great when I'm not filtering the baseentry table. I get 24 rows, as expected (one for each hour of the requested interval). There are no order_dates that overlap with 2021-12-25 21:00:00+00:00 and later, so the values are all 0 or None for those rows. Perfect!

1 2021-12-25 00:00:00+00:00 1 0.65128747160000000000 0.6512874716
2 2021-12-25 01:00:00+00:00 1 0.65128747160000000000 0.6512874716
3 2021-12-25 02:00:00+00:00 1 0.65128747160000000000 0.6512874716
4 2021-12-25 03:00:00+00:00 1 0.65128747160000000000 0.6512874716
5 2021-12-25 04:00:00+00:00 1 0.65128747160000000000 0.6512874716
6 2021-12-25 05:00:00+00:00 1 0.65128747160000000000 0.6512874716
7 2021-12-25 06:00:00+00:00 1 0.65128747160000000000 0.6512874716
8 2021-12-25 07:00:00+00:00 1 0.65128747160000000000 0.6512874716
9 2021-12-25 08:00:00+00:00 1 0.65128747160000000000 0.6512874716
10 2021-12-25 09:00:00+00:00 1 0.65128747160000000000 0.6512874716
11 2021-12-25 10:00:00+00:00 1 0.65128747160000000000 0.6512874716
12 2021-12-25 11:00:00+00:00 1 0.65128747160000000000 0.6512874716
13 2021-12-25 12:00:00+00:00 1 0.65128747160000000000 0.6512874716
14 2021-12-25 13:00:00+00:00 1 0.65128747160000000000 0.6512874716
15 2021-12-25 14:00:00+00:00 1 0.65128747160000000000 0.6512874716
16 2021-12-25 15:00:00+00:00 1 0.65128747160000000000 0.6512874716
17 2021-12-25 16:00:00+00:00 1 0.65128747160000000000 0.6512874716
18 2021-12-25 17:00:00+00:00 1 0.65128747160000000000 0.6512874716
19 2021-12-25 18:00:00+00:00 1 0.65128747160000000000 0.6512874716
20 2021-12-25 19:00:00+00:00 1 0.65128747160000000000 0.6512874716
21 2021-12-25 20:00:00+00:00 1 0.65128747160000000000 0.6512874716
22 2021-12-25 21:00:00+00:00 0 None None
23 2021-12-25 22:00:00+00:00 0 None None
24 2021-12-25 23:00:00+00:00 0 None None

But if I uncomment the WHERE clause to start filtering (or add any other filtering), I get the following result:

1 2021-12-25 00:00:00+00:00 1 0.65128747160000000000 0.6512874716
2 2021-12-25 01:00:00+00:00 1 0.65128747160000000000 0.6512874716
3 2021-12-25 02:00:00+00:00 1 0.65128747160000000000 0.6512874716
4 2021-12-25 03:00:00+00:00 1 0.65128747160000000000 0.6512874716
5 2021-12-25 04:00:00+00:00 1 0.65128747160000000000 0.6512874716
6 2021-12-25 05:00:00+00:00 1 0.65128747160000000000 0.6512874716
7 2021-12-25 06:00:00+00:00 1 0.65128747160000000000 0.6512874716
8 2021-12-25 07:00:00+00:00 1 0.65128747160000000000 0.6512874716
9 2021-12-25 08:00:00+00:00 1 0.65128747160000000000 0.6512874716
10 2021-12-25 09:00:00+00:00 1 0.65128747160000000000 0.6512874716
11 2021-12-25 10:00:00+00:00 1 0.65128747160000000000 0.6512874716
12 2021-12-25 11:00:00+00:00 1 0.65128747160000000000 0.6512874716
13 2021-12-25 12:00:00+00:00 1 0.65128747160000000000 0.6512874716
14 2021-12-25 13:00:00+00:00 1 0.65128747160000000000 0.6512874716
15 2021-12-25 14:00:00+00:00 1 0.65128747160000000000 0.6512874716
16 2021-12-25 15:00:00+00:00 1 0.65128747160000000000 0.6512874716
17 2021-12-25 16:00:00+00:00 1 0.65128747160000000000 0.6512874716
18 2021-12-25 17:00:00+00:00 1 0.65128747160000000000 0.6512874716
19 2021-12-25 18:00:00+00:00 1 0.65128747160000000000 0.6512874716
20 2021-12-25 19:00:00+00:00 1 0.65128747160000000000 0.6512874716
21 2021-12-25 20:00:00+00:00 1 0.65128747160000000000 0.6512874716
  • Should my WHERE clause be placed somewhere else?
  • What am I doing wrong here?

I need to be able to filter the baseentry table while also keeping the null rows.


Solution

  • Replace WHERE t.status = ANY(ARRAY['new', 'cancelled']) with AND t.status = ANY(ARRAY['new', 'cancelled']), and it works correctly!