Search code examples
sqlruby-on-railsruby-on-rails-3postgresqlpostgresql-performance

Order BY turns a 30ms query into a 7120ms query. Known performance issue?


I have a User table with 1m records:

User (id, fname, lname, deleted_at, guest)

I have the following query which is being run against a postgres 9.1 db:

SELECT "users".* 
FROM "users" 
WHERE (users.deleted_at IS NULL) AND (SUBSTRING(lower(fname), 1, 1) = 's') 
ORDER BY guest = false, fname ASC 
LIMIT 25 OFFSET 0

Using pgAdmin 3, this SQL is taking 7120ms to return 25 rows. If I remove the 'ORDER BY guest = false, fname ASC' the query takes just 31ms.

I have the following indexes:

add_index "users", ["fname"], :name => "index_users_on_fname"
add_index "users", ["guest", "fname"], :name => "index_users_on_guest_and_fname"
add_index "users", ["deleted_at"], :name => "index_users_on_deleted_at"
add_index "users", ["guest"], :name => "index_users_on_guest"

Any ideas? Thank you!

UPDATED with Explain

"Limit  (cost=43541.55..43541.62 rows=25 width=1612) (actual time=1276.777..1276.783 rows=25 loops=1)"
"  ->  Sort  (cost=43541.55..43558.82 rows=6905 width=1612) (actual time=1276.775..1276.777 rows=25 loops=1)"
"        Sort Key: ((NOT guest)), fname"
"        Sort Method: top-N heapsort  Memory: 37kB"
"        ->  Seq Scan on users  (cost=0.00..43346.70 rows=6905 width=1612) (actual time=5.143..1272.563 rows=475 loops=1)"
"              Filter: ((deleted_at IS NULL) AND pubic_profile_visible AND ((fname)::text ~~ 's%'::text))"
"Total runtime: 1276.967 ms"

Solution

  • First, since PostgreSQL 9.1 you can use left() to simplify the expression:

    substring(lower(fname), 1, 1)
    lower(left(fname, 1)) -- equivalent, but simpler and faster
    

    Also slightly faster to take the first character before casting to lower case.
    Next, clean up the query:

    SELECT * 
    FROM   users 
    WHERE  deleted_at IS NULL
    AND    lower(left(fname, 1)) = 's'
    ORDER  BY guest DESC NULLS LAST, fname
    LIMIT  25 OFFSET 0;
    

    guest DESC NULLS LAST results in the same as guest = FALSE, just without calculating a new value for every row.
    Next, create this multi-column partial index:

    CREATE INDEX users_multi_idx
    ON users (lower(left(fname, 1)), guest DESC NULLS LAST, fname)
    WHERE deleted_at IS NULL;
    

    Run

    ANALYZE users;
    

    Or, even better, CLUSTER (if you don't have more important queries requiring a different order) - and then ANALYZE:

    CLUSTER users using users_multi_idx;
    

    And it will be way faster than anything you tried before. Because now, the query reads rows from the index sequentially and the table has been physically rewritten in the same order, resulting in only few page hits ...