I have a table with DDL:
CREATE TABLE public.test_table (
id serial4 NOT NULL,
"date" int8 NOT NULL,
user_id int4 NOT NULL,
device_id int4 NULL,
CONSTRAINT test_table_date_user_id_device_id_key UNIQUE (date, user_id, device_id),
CONSTRAINT test_table_pkey PRIMARY KEY (id)
);
ALTER TABLE public.test_table ADD CONSTRAINT test_table FOREIGN KEY (device_id) REFERENCES public.t_device(id);
ALTER TABLE public.test_table ADD CONSTRAINT test_table_user_id_fkey FOREIGN KEY (user_id) REFERENCES public.t_user(id) ON DELETE CASCADE;
Sample:
| id | date | user_id |
| -------- | -------------- | ------- |
| 1886 | 1716625890 | 5 |
| 1887 | 1716626430 | 5 |
| 1888 | 1716627030 | 5 |
And I need to find rows count grouped by days and also I need to print the first and the last dates in each day. As an input I have two dates: from
and to
and user_id
.
I'm trying to use the following sql:
with calendar as (
select
d
from
generate_series(to_timestamp(:FROM)::date,
to_timestamp(:TO)::date,
interval '1 day') d)
select
c.d::date as item_date,
count(dt.id) as item_count,
min(dt."date") as item_min,
max(dt."date") as item_max
from
test_table dt
left join
calendar c
on
to_timestamp(dt.date)::date >= c.d
and
to_timestamp(dt.date)::date < c.d + interval '1 day'
where dt.user_id = 5
group by
c.d
order by
c.d;
My current output:
|item_date |item_count|item_min |item_max |
|----------|----------|-------------|-------------|
|2024-05-25|144 |1,716,584,490|1,716,670,230|
|2024-05-26|144 |1,716,670,816|1,716,756,630|
|2024-05-27|144 |1,716,757,221|1,716,843,030|
| |4,770 |1,716,286,230|1,719,514,890|
explain(analyze, verbose, buffers, settings)
result:
Sort (cost=199067.86..199068.36 rows=200 width=36) (actual time=10.152..10.154 rows=3 loops=1)
Output: ((d.d)::date), (count(dt.id)), (min(dt.date)), (max(dt.date)), d.d
Sort Key: d.d
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=363
-> HashAggregate (cost=199057.72..199060.22 rows=200 width=36) (actual time=10.144..10.146 rows=3 loops=1)
Output: (d.d)::date, count(dt.id), min(dt.date), max(dt.date), d.d
Group Key: d.d
Buffers: shared hit=363
-> Nested Loop (cost=0.30..192966.61 rows=609111 width=20) (actual time=0.292..10.074 rows=432 loops=1)
Output: d.d, dt.id, dt.date
Join Filter: (((to_timestamp((dt.date)::double precision))::date >= d.d) AND ((to_timestamp((dt.date)::double precision))::date < (d.d + '1 day'::interval)))
Rows Removed by Join Filter: 15174
Buffers: shared hit=363
-> Function Scan on pg_catalog.generate_series d (cost=0.01..10.01 rows=1000 width=8) (actual time=0.006..0.008 rows=3 loops=1)
Output: d.d
Function Call: generate_series((('2024-05-25 11:31:30+03'::timestamp with time zone)::date)::timestamp with time zone, (('2024-05-27 14:50:30+03'::timestamp with time zone)::date)::timestamp with time zone, '1 day'::interval)
-> Materialize (cost=0.29..1100.30 rows=5482 width=12) (actual time=0.011..1.198 rows=5202 loops=3)
Output: dt.id, dt.date
Buffers: shared hit=363
-> Index Scan using test_table_date_user_id_device_id_key on public.test_table dt (cost=0.29..1072.89 rows=5482 width=12) (actual time=0.032..2.070 rows=5202 loops=1)
Output: dt.id, dt.date
Index Cond: (dt.user_id = 5)
Buffers: shared hit=363
Settings: effective_cache_size = '1377MB', effective_io_concurrency = '200', max_parallel_workers = '1', random_page_cost = '1.1', search_path = 'public, public, "$user"', work_mem = '524kB'
Planning Time: 0.147 ms
Execution Time: 10.264 ms
Is the query optimal?
It's not. The CTE is unnecessary - you're already turning your unix timestamp into a date by to_timestamp(dt.date)::date
. I don't see a reason to generate mismatched dates with generate_series()
.
Is it possible to remove the last total row in the result?
It's there because 4,770
dates you generated in the CTE don't match anything in test_table
. If you remove the CTE, you'll get rid of that last line, shorten, simplify and speed up your query:
demo at db<>fiddle
select
to_timestamp(dt.date)::date as item_date,
count(dt.id) as item_count,
min(dt."date") as item_min,
max(dt."date") as item_max
from
test_table dt
where dt.user_id = 5
--and to_timestamp(dt.date)::date >= to_timestamp(:FROM)::date
--and to_timestamp(dt.date)::date < to_timestamp(:TO)::date
group by
item_date
order by
item_date;
item_date | item_count | item_min | item_max |
---|---|---|---|
2024-05-24 | 144 | 1716509706 | 1716591459 |
2024-05-25 | 144 | 1716595589 | 1716677784 |
2024-05-26 | 144 | 1716682103 | 1716764034 |
If generate_series()
was there to narrow down to a target period (of what looks like 13 years), you can uncomment the two added conditions above.
Both bigint
and timestamp(tz)
are 8 bytes. If you have control over the schema, changing the column type won't cost any additional space and it'll make it easier to work with it, enabling all built-in functions and operators without the need to run it through to_timestamp()
every time.
Nitpicking, because dt.id
is likely a primary key
implying not null
but it's good to make sure you're intentionally not counting rows where dt.id is null
:
count ( * ) → bigint
Computes the number of input rows.
count ( "any" ) → bigint
Computes the number of input rows in which the input value is not null.
When you use a full
or right
join
, even though the column in the left table isn't nullable, it'll hold a null
for all rows without a matching right.