Search code examples
ruby-on-railspostgresqlindexingruby-paranoia

Multi-column index vs seperate indexes vs partial indexes


While working on my Rails app today I noticed that the paranoia gem says that indexes should be updated to add the deleted_at IS NOT NULL as a where on the index creation (github link). But It occurred to me that the inverted condition when I do want with_deleted, won't benefit from the index.

This makes me wonder...

I know that this is somewhat obtuse because the answer is obviously "it depends on what you need" but I am trying to get an idea of the differences between Multi-column index vs separate indexes vs partial indexes on my web app backed by PostgreSQL.

Basically, I have 2 fields that I am querying on: p_id and deleted_at. Most of the time I am querying WHERE p_id=1 AND deleted_at IS NOT NULL - but sometimes I only query WHERE p_id=1. Very seldom, I will WHERE p_id=1 AND deleted_at=1/1/2017.

So, Am I better off:

  1. Having an index on p_id and a separate index on deleted_at?
  2. Having an index on p_id but add 'where deleted_at IS NOT NULL'?
  3. Having a combined index on p_id and deleted_at together?

Note: perhaps I should mention that p_id is currently a foreign key reference to p.id. Which reminds me, in Postgres, is it necessary for foreign keys to also have indexes (or do they get an index derived from being a foreign key constraint - I've read conflicting answers on this)?


Solution

  • The answer depends on

    • how often you use each of these queries, and how long they are allowed to run
    • if query speed is important enough that slow data changes can be tolerated.

    The perfect indexes for the three clauses are:

    1. WHERE p_id=1 AND deleted_at IS NOT NULL

      CREATE INDEX ON mytable (p_id) WHERE deleted_at IS NOT NULL;
      
    2. WHERE p_id=1 AND deleted_at=1/1/2017

      CREATE INDEX ON mytable (p_id, deleted_at);
      
    3. WHERE p_id=1

      CREATE INDEX ON mytable (p_id);
      

    The index created for 2. can also be used for 3., so if you need to speed up the second query as much as possible and a slightly bigger index doesn't bother you, create only the index from 2. for both queries.

    However, the index from 3. will also speed up the query in 2., just not as much as possible, so if you can live with a slightly worse performance for the query in 2. and want the index as small and efficient as possible for the query in 3., create only the index in 3.

    I would not create both the indexes from 2. an 3.; you should pick what is best for you.

    The case with 1. is different, because that index can only be used for the first query. Create that index only if you want to speed up that query as much as possible, and it doesn't matter if data modifications on the table take longer, because an additional index has to be maintained.

    Another indication to create the index in 1. is if only a small percentage of rows satisfies deleted_at IS NOT NULL. If not, the index in 1. doesn't have a great advantage over the one in 3., and you should just create the latter.

    Having two separate indexes on the two columns is probably not the best choice – they can be used in combination only with a bitmap index scan, and it may well be that PostgreSQL only chooses to use one of the indexes (depends on the distribution, but probably the one on p_id), and the other one is useless.