Search code examples
postgresqlindexingpostgresql-performance

Postgres using an index for one table but not another


I have three tables in my app, call them tableA, tableB, and tableC. tableA has fields for tableB_id and tableC_id, with indexes on both. tableB has a field foo with an index, and tableC has a field bar with an index.

When I do the following query:

select * 
from tableA 
  left outer join tableB on tableB.id = tableA.tableB_id 
where lower(tableB.foo) = lower(my_input)

it is really slow (~1 second).

When I do the following query:

select * 
from tableA 
   left outer join tableC on tableC.id = tabelA.tableC_id 
where lower(tableC.bar) = lower(my_input)

it is really fast (~20 ms).

From what I can tell, the tables are about the same size.

Any ideas as to the huge performance difference between the two queries?


UPDATES

Table sizes:

  • tableA: 2061392 rows

  • tableB: 175339 rows

  • tableC: 1888912 rows


tag info

Postgres version - 9.3.5

Full text of the queries are above.

Explain plans - tableB tableC

Relevant info from tables:

  • tableA
    • tableB_id, integer, no modifiers, storage plain
      • "index_tableA_on_tableB_id" btree (tableB_id)
    • tableC_id, integer, no modifiers, storage plain,
      • "index_tableA_on_tableB_id" btree (tableC_id)
  • tableB
    • id, integer, not null default nextval('tableB_id_seq'::regclass), storage plain
      • "tableB_pkey" PRIMARY_KEY, btree (id)
    • foo, character varying(255), no modifiers, storage extended
      • "index_tableB_on_lower_foo_tableD" UNIQUE, btree (lower(foo::text), tableD_id)
        • tableD is a separate table that is otherwise irrelevant
  • tableC
    • id, integer, not null default nextval('tableC_id_seq'::regclass), storage plain
      • "tableC_pkey" PRIMARY_KEY, btree (id)
    • bar, character varying(255), no modifiers, storage extended
      • "index_tableC_on_tableB_id_and_bar" UNIQUE, btree (tableB_id, bar)
      • "index_tableC_on_lower_bar" btree (lower(bar::text))

Hardware:

  • OS X 10.10.2

  • CPU: 1.4 GHz Intel Core i5

  • Memory: 8 GB 1600 MHz DDR3

  • Graphics: Intel HD Graphics 5000 1536 MB


Solution

Looks like running vacuum and then analyze on all three tables fixed the issue. After running the commands, the slow query started using "index_patients_on_foo_tableD".


Solution

  • The other thing is that you have your indexed columns queried as lower() , which can also be creating a partial index when the query is running.

    If you will always query the column as lower() then your column should be indexed as lower(column_name) as in:

    create index idx_1 on tableb(lower(foo));
    

    Also, have you looked at the execution plan? This will answer all your questions if you can see how it is querying the tables.

    Honestly, there are many factors to this. The best solution is to study up on INDEXES, specifically in Postgres so you can see how they work. It is a bit of holistic subject, you can't really answer all your problems with a minimal understanding of how they work.

    For instance, Postgres has an initial "lets look at these tables and see how we should query them" before the query runs. It looks over all tables, how big each of the tables are, what indexes exist, etc. and then figures out how the query should run. THEN it executes it. Oftentimes, this is what is wrong. The engine incorrectly determines how to execute it.

    A lot of the calculations of this are done off of the summarized table statistics. You can reset the summarized table statistics for any table by doing:

    vacuum [table_name];
    

    (this helps to prevent bloating from dead rows)

    and then:

    analyze [table_name];
    

    I haven't always seen this work, but often times it helps.

    ANyway, so best bet is to:

    a) Study up on Postgres indexes (a SIMPLE write up, not something ridiculously complex) b) Study up the execution plan of the query c) Using your understanding of Postgres indexes and how the query plan is executing, you cannot help but solve the exact problem.