I am little bit confused and need some advice. I use PostgreSQL 11
database. I have such pretty simple sql statement:
SELECT DISTINCT "CITY", "AREA", "REGION"
FROM youtube
WHERE
"CITY" IS NOT NULL
AND
"AREA" IS NOT NULL
AND
"REGION" IS NOT NULL
youtube
table which I use in sql statement has 25 million records. I think for thats why query takes 15-17 seconds to complete. For web project where I use that query it's too long. I'm trying to speed up the request.
I create such index for youtube table:
CREATE INDEX youtube_location_idx ON public.youtube USING btree ("CITY", "AREA", "REGION");
After this step I run query again but it takes the same time to complete. It seems like query don't use index. How do I know if any index is used in a query?
You answered the question in the title yourself by running EXPLAIN
. The query plan shows which indexes are used and how. For details see the chapter "Using EXPLAIN
" in the manual.
As for why the query uses a sequential scan and no indexes: 25 million row, 992781 rows removed
. You are fetching 24709900 rows
, that's almost all rows.
This is never going to be fast. And it's only going to use an index in special situations.
Using an index typically only makes sense for a small fraction of all rows. Else it would just add additional cost. Depending on a number of co-factors, the Postgres query planner starts to consider a btree index for around 5% of all rows or less. Related:
Well, if your table rows are substantially wider than the three columns in your SELECT
list, a (partial) covering index might help somewhat if you get index-only scans out of it. Again, needs to meet some preconditions. And every index also adds storage and maintenance costs.
Aside: A comment claimed NULL
values couldn't be indexed. This is incorrect, NULL
values can be indexed. Not as efficient as other values, but doesn't make much of a difference. Also irrelevant to the case at hand.