I have a User table with 1m records:
User (id, fname, lname, deleted_at, guest)
I have the following query which is being run against a postgres 9.1 db:
SELECT "users".*
FROM "users"
WHERE (users.deleted_at IS NULL) AND (SUBSTRING(lower(fname), 1, 1) = 's')
ORDER BY guest = false, fname ASC
LIMIT 25 OFFSET 0
Using pgAdmin 3, this SQL is taking 7120ms to return 25 rows. If I remove the 'ORDER BY guest = false, fname ASC' the query takes just 31ms.
I have the following indexes:
add_index "users", ["fname"], :name => "index_users_on_fname"
add_index "users", ["guest", "fname"], :name => "index_users_on_guest_and_fname"
add_index "users", ["deleted_at"], :name => "index_users_on_deleted_at"
add_index "users", ["guest"], :name => "index_users_on_guest"
Any ideas? Thank you!
UPDATED with Explain
"Limit (cost=43541.55..43541.62 rows=25 width=1612) (actual time=1276.777..1276.783 rows=25 loops=1)"
" -> Sort (cost=43541.55..43558.82 rows=6905 width=1612) (actual time=1276.775..1276.777 rows=25 loops=1)"
" Sort Key: ((NOT guest)), fname"
" Sort Method: top-N heapsort Memory: 37kB"
" -> Seq Scan on users (cost=0.00..43346.70 rows=6905 width=1612) (actual time=5.143..1272.563 rows=475 loops=1)"
" Filter: ((deleted_at IS NULL) AND pubic_profile_visible AND ((fname)::text ~~ 's%'::text))"
"Total runtime: 1276.967 ms"
First, since PostgreSQL 9.1 you can use left()
to simplify the expression:
substring(lower(fname), 1, 1)
lower(left(fname, 1)) -- equivalent, but simpler and faster
Also slightly faster to take the first character before casting to lower case.
Next, clean up the query:
SELECT *
FROM users
WHERE deleted_at IS NULL
AND lower(left(fname, 1)) = 's'
ORDER BY guest DESC NULLS LAST, fname
LIMIT 25 OFFSET 0;
guest DESC NULLS LAST
results in the same as guest = FALSE
, just without calculating a new value for every row.
Next, create this multi-column partial index:
CREATE INDEX users_multi_idx
ON users (lower(left(fname, 1)), guest DESC NULLS LAST, fname)
WHERE deleted_at IS NULL;
Run
ANALYZE users;
Or, even better, CLUSTER
(if you don't have more important queries requiring a different order) - and then ANALYZE
:
CLUSTER users using users_multi_idx;
And it will be way faster than anything you tried before. Because now, the query reads rows from the index sequentially and the table has been physically rewritten in the same order, resulting in only few page hits ...