Search code examples
sqlpostgresqlindexingslickpostgresql-performance

Bidirectional index


Is there way for bidirectional index (for effective ordering ASC/DESC)?

There is some table:

CREATE TABLE t1(
   id VARCHAR NOT NULL PRIMARY KEY, 
   d TIMESTAMP)

and there is DESC index for d field:

CREATE INDEX d_index ON t1 (d DESC);

Because d_index is DESC then ordering by desc will be more effective than ordering by asc.


UPDATE The above is abstract example. A real schema are:

CREATE TABLE user (id VARCHAR NOT NULL PRIMARY KEY);

CREATE TABLE event(
   id VARCHAR NOT NULL PRIMARY KEY,
   author VARCHAR REFERENCES user(id),
   created TIMESTAMP without time zone,
   param1 VARCHAR,
   param2 VARCHAR,
   param3 VARCHAR);
CREATE INDEX event_author_index ON event (author);
CREATE INDEX event_created_index ON event (created);

CREATE TABLE subscribe (
   id VARCHAR NOT NULL PRIMARY KEY,
   uid VARCHAR REFERENCES user(id),
   target_uid VARCHAR REFERENCES user(id));

CREATE INDEX subscribe_uid_index ON subscribe (uid, target_uid);
  • users count ~ 1.5 Million
  • event count ~ 1.0 Million
  • subscribe count ~ 1.2 Million

Queries was generated by typesafe slick 3 (scala).

DESC ordering (very slow):

 explain analyze 
   select x2.x3, x2.x4 from 
    (select x5."id" as x3, x5."created" as x4 from "subscribe" x6, "event" x5 where 
       (x6."uid" = 'u1') and (x5."author" = x6."target_uid") 
   order by x5."created" desc limit 10) x2;

Limit  (cost=0.85..30.08 rows=10 width=28) (actual time=11629.178..11629.289 rows=10 loops=1)
   Output: x5.id, x5.created
   ->  Nested Loop  (cost=0.85..529307.30 rows=181083 width=28) (actual time=11629.177..11629.284 rows=10 loops=1)
         Output: x5.id, x5.created
         ->  Index Scan Backward using event_created_index on public.event x5  (cost=0.42..39295.00 rows=1002105 width=40) (actual time=38.574..8828.120 rows=923101 loops=1)
               Output: x5.id, x5.created, x5.author, x5.param1, x5.param2, x5.param3   
         ->  Index Only Scan using subscribe_uid_index on public.subscribe x6  (cost=0.43..0.48 rows=1 width=14) (actual time=0.002..0.002 rows=0 loops=923101)
               Output: x6.uid, x6.target_uid
               Index Cond: ((x6.uid = 'u1'::text) AND (x6.target_uid = (x5.author)::text))
               Heap Fetches: 0
 Planning time: 121.017 ms
 Execution time: 11629.749 ms

ASC ordering (the same query):

explain analyze 
  select x2.x3, x2.x4 from 
    (select x5."id" as x3, x5."created" as x4 from "subscribe" x6, "event" x5 where 
      (x6."uid" = 'u1') and (x5."author" = x6."target_uid") 
  order by x5."created" limit 10) x2;

 Limit  (cost=0.85..30.08 rows=10 width=28) (actual time=453.712..453.813 rows=10 loops=1)
   ->  Nested Loop  (cost=0.85..529307.30 rows=181083 width=28) (actual time=453.710..453.807 rows=10 loops=1)
         ->  Index Scan using event_created_index on event x5  (cost=0.42..39295.00 rows=1002105 width=40) (actual time=31.938..214.687 rows=79015 loops=1)
         ->  Index Only Scan using subscribe_uid_index on subscribe x6  (cost=0.43..0.48 rows=1 width=14) (actual time=0.003..0.003 rows=0 loops=79015)
               Index Cond: ((uid = 'u1'::text) AND (target_uid = (x5.author)::text))
               Heap Fetches: 0
 Planning time: 121.426 ms
 Execution time: 454.235 ms

UPD 2 (the results from answer):

DROP INDEX event_author_index;
DROP INDEX event_created_index;
CREATE INDEX event_c1_index ON event (author, created);
REINDEX TABLE event;

DESC ordering plan (for ASC is the same):

 Limit  (cost=36782.56..36782.58 rows=10 width=28) (actual time=2186.408..2186.412 rows=10 loops=1)
   ->  Sort  (cost=36782.56..37235.26 rows=181083 width=28) (actual time=2186.407..2186.408 rows=10 loops=1)
         Sort Key: x5.created
         Sort Method: quicksort  Memory: 25kB
         ->  Merge Join  (cost=30037.41..32869.42 rows=181083 width=28) (actual time=2186.352..2186.374 rows=10 loops=1)
               Merge Cond: ((x5.author)::text = (x6.target_uid)::text)
               ->  Index Scan using event_c1_index on event x5  (cost=0.42..65573.44 rows=1002105 width=40) (actual time=38.211..112.868 rows=2101 loops=1)
               ->  Sort  (cost=30036.99..30037.57 rows=233 width=14) (actual time=2072.850..2072.852 rows=6 loops=1)
                     Sort Key: x6.target_uid
                     Sort Method: quicksort  Memory: 25kB
                     ->  Seq Scan on subscribe x6  (cost=0.00..30027.83 rows=233 width=14) (actual time=0.010..2072.823 rows=2 loops=1)
                           Filter: ((uid)::text = 'u1'::text)
                           Rows Removed by Filter: 1214224
 Planning time: 118.962 ms
 Execution time: 2186.460 ms

Performance is increased. But the cost is increased drastically. It is normal?


Solution

  • The index order in your query does not matter, if you added index with DSEC order the result would be the same. What matters is loops=79,015 vs. loops=923,101. Your data is such that pg must do over 10 times more checks on the subscribe_uid_index to get the desired number of results.

    Try instead:

    CREATE INDEX subscribe_target_uid_index ON subscribe (target_uid, uid);
    

    Or even:

    CREATE INDEX subscribe_target_uid_index_f ON subscribe (target_uid) WHERE "uid" = 'u1'