I have a table 'post' which has millions of rows. I need top N records using the date field but it's taking too much time because of Top-N-heapsort. How can I optimize this query?
Note: On the staging server where I have less data it's working fine. Index scan is applying there.
DB Version: Postgres 12
CREATE TABLE public.posts
(
post_id integer NOT NULL DEFAULT (primary key),
team_id integer NOT NULL,
is_active boolean DEFAULT true,
"createdAt" timestamp with time zone,
"updatedAt" timestamp with time zone,
)
Index column are:
"createdAt DESC NULLS Last",
(team_id ASC NULLS LAST, "createdAt" DESC NULLS LAST),
team_id ASC NULLS LAST
QUERY:
SELECT p.*
FROM posts p
WHERE team_id = 1
AND p.is_active = true AND "createdAt" IS NOT NULL
ORDER BY "createdAt" DESC
LIMIT 20;
Query plan:
"Limit (cost=138958.67..138958.72 rows=20 width=360) (actual time=356.391..356.419 rows=20 loops=1)"
" -> Sort (cost=138958.67..139078.57 rows=47960 width=360) (actual time=356.389..356.402 rows=20 loops=1)"
" Sort Key: ""createdAt"" DESC"
" Sort Method: top-N heapsort Memory: 34kB"
" -> Index Scan using posts_team_id on posts p (cost=0.44..137682.47 rows=47960 width=360) (actual time=0.042..317.258 rows=52858 loops=1)"
" Index Cond: (team_id = 1)"
" Filter: (is_active AND (""createdAt"" IS NOT NULL))"
"Planning Time: 0.145 ms"
"Execution Time: 356.459 ms"
This is the index I would use:
CREATE INDEX ON posts (team_id, "createdAt") WHERE is_active;
It supports the WHERE
condition and the ORDER BY
.