I have a table like:
CREATE TABLE summary
(
id serial NOT NULL,
user_id bigint NOT NULL,
country character varying(5),
product_id bigint NOT NULL,
category_id bigint NOT NULL,
text_id bigint NOT NULL,
text character varying(255),
product_type integer NOT NULL,
event_name character varying(255),
report_date date NOT NULL,
currency character varying(5),
revenue double precision,
last_event_time timestamp
);
My table size is 1786 MB (except index). In here, I've created index like below:
CREATE INDEX "idx_as_type_usr_productId_eventTime"
ON summary USING btree
(product_type, user_id, product_id, last_event_time)
INCLUDE(event_name);
And my simple query looks like below:
select
event_name,
max(last_event_time)
from summary s
where s.user_id = ? and s.product_id = ? and s.product_type = ?
and s.last_event_time > '2020-03-01' and s.last_event_time < '2020-03-25'
group by event_name;
When I explain it, it looks like;
HashAggregate (cost=93.82..96.41 rows=259 width=25) (actual time=9187.533..9187.536 rows=10 loops=1)
Group Key: event_name
Buffers: shared hit=70898 read=10579 dirtied=22650
I/O Timings: read=3876.367
-> Index Only Scan using "idx_as_type_usr_productId_eventTime" on summary s (cost=0.56..92.36 rows=292 width=25) (actual time=0.485..9153.812 rows=87322 loops=1)
Index Cond: ((product_type = 2) AND (product_id = ?) AND (product_id = ?) AND (last_event_time > '2020-03-01 00:00:00'::timestamp without time zone) AND (last_event_time < '2020-03-25 00:00:00'::timestamp without time zone))
Heap Fetches: 35967
Buffers: shared hit=70898 read=10579 dirtied=22650
I/O Timings: read=3876.367
Planning Time: 0.452 ms
Execution Time: 9187.583 ms
In here, everything looks fine. But when I execute it, it takes more than 10 seconds, sometime it takes more than 30 seconds.
Updated:
Query Without - GroupBy:
select
event_name,
last_event_time
from summary s
where s.user_id = ? and s.product_id = ? and s.product_type = ?
and s.last_event_time > '2020-03-01' and s.last_event_time < '2020-03-25';
Explain Without - Group By:
Index Only Scan using "idx_as_type_usr_productId_eventTime" on summary s (cost=0.56..92.36 rows=292 width=25) (actual time=0.023..79.138 rows=87305 loops=1)
Index Cond: ((product_type = ?) AND (user_id = ?) AND (product_id = ?) AND (last_event_time > '2020-03-01 00:00:00'::timestamp without time zone) AND (last_event_time < '2020-03-25 00:00:00'::timestamp without time zone))
Heap Fetches: 22949
Buffers: shared hit=37780 read=12143 dirtied=15156
I/O Timings: read=4418.930
Planning Time: 0.639 ms
Execution Time: 4625.213 ms
There are several problems:
PostgreSQL had to set hint bits, which dirty the pages and cause writes.
PostgreSQL has to fetch table rows from disk to fetch their visibility.
PostgreSQL has to scan 80000 pages to get 87000 rows, so the index must be totally bloated.
The first two can be taken care of by running
VACUUM summary;
which is always a good idea after a bulk load, and the bloat can be cured by
REINDEX INDEX "idx_as_type_usr_productId_eventTime";