Search code examples
sqlpostgresqlquery-optimization

Optimizing indexes for large PostgreSQL table


I have a large table (location_history) with uid and geohash, and they are both primary key and indexed.

Indexes:
    "location_history_pkey" PRIMARY KEY, btree (uid, geohash)
    "ix_location_history_geohash" btree (geohash)
    "ix_location_history_uid" btree (uid)

But when I query the data by the following constraints, it runs really slow (169 seconds!) even the two columns are already indexed:

EXPLAIN (analyze, verbose, buffers) SELECT * FROM location_history WHERE uid = 123 AND geohash >= 'hzzzzzzzzzz';
                                                                            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using location_history_pkey on public.location_history  (cost=0.71..84735.92 rows=22438 width=29) (actual time=1.434..114713.751 rows=189319 loops=1)
   Output: created_at, updated_at, uid, geohash, is_from_zenly
   Index Cond: ((location_history.uid = 123) AND ((location_history.geohash)::text >= 'hzzzzzzzzzz'::text))
   Buffers: shared hit=5822 read=179864 dirtied=132 written=19338
   I/O Timings: read=111433.601 write=2564.930
 Query Identifier: -7646491345250917333
 Planning Time: 0.103 ms
 Execution Time: 114746.908 ms
(8 rows)

And here is the result from pgstattuple():

SELECT * FROM pgstattuple(416708);
  table_len   | tuple_count |  tuple_len   | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space | free_percent
--------------+-------------+--------------+---------------+------------------+----------------+--------------------+------------+--------------
 812873588736 | 15570800464 | 685124233392 |         84.28 |             1506 |          66264 |                  0 |  394834592 |         0.05
(1 row)

The bloat_ratio is about 34%:

SELECT 100-(pgstatindex('location_history_pkey')).avg_leaf_density;

      ?column?
--------------------
 34.260000000000005
(1 row)

Could you share any tips to speed up the query? Thanks.


Solution

  • You could get a big improvement if you had an index only scan, which you could by including in the index all the columns needed by the query, apparently (uid, geohash, is_from_zenly, created_at, updated_at). In that case it would only need to visit the index pages and not the (randomly scattered) table pages, as long as the table was well vacuumed. You could get that "covering" index without partially duplicating the primary key index by doing:

    alter table location_history add constraint location_history_pkey primary key (uid, geohash) include (is_from_zenly, created_at, updated_at)
    

    Of course you would need to drop the current primary key first, so that would entail some downtime. You could instead just create the new index and live with the redundant indexes.

    Of course if your real set of queries can contain more columns than just the ones you showed in your example, this might not be a very complete solution.