Search code examples
sqlpostgresqlindexingpostgismulti-index

Multi column spatial + other dtype indexes? Need extreme efficiency for Postgres queries within some area + some time range


We need extreme efficiency running queries where we'll need to find an entry within an area + within a time range. Both columns are already indexed. When running queries like this:

SELECT * FROM "tablename" 
WHERE ("tablename"."geom" @ ST_GeomFromEWKB('\x0103000020e61000000100000005000000da194bfd61c81140605edb36e7db2040da194bfd61c81140745c3233d80d21409858cfff61a91240745c3233d80d21409858cfff61a91240605edb36e7db2040da194bfd61c81140605edb36e7db2040'::bytea) 
AND "tablename"."date" >= '2019-01-01T00:00:00+00:00'::timestamptz)

explains show Postgres runs two bitmap index scans and then a bitmap heap scan on these subsets. This is performant but not performant enough as the DB has tens of millions of entries and these subsets have hundreds of thousands of entries and this heap scan + possibly writing subsets to disk causes an unacceptable ~1000ms delay.

A multi column index would drastically improve performance but we don't know how to make one that Postgres will use with these queries. We've found btree-gist, and we can create a multi column index with it:

CREATE EXTENSION IF NOT EXISTS btree_gist;
CREATE INDEX test_mult_idx ON tablename USING GIST (geom, date);

But when we run our above query with this multi column index it isn't used, as the query planner uses each column's single column index. We dropped these single col indexes to test whether we could force it and it uses the multi column index but there's no performance improvement. It just does two index scans then a heap scan.

We've scoured the internet and can't find any mention of spatial + other field multi column indexes anywhere. We don't understand the math that runs btree / gist / other types of indexes - is this even logically possible with different data types? Is there anything we can do?

Extra info requested by commenters: execution plan unformatted generated from explain (analyze, buffers) select ...

"Bitmap Heap Scan on historic  (cost=1894.90..2485.28 rows=150 width=286) (actual time=110.183..226.016 rows=24115 loops=1)"
"  Recheck Cond: ((geom @ '0103000020E610000001000000050000005E11FC6F25E26240ED2AA4FCA4FE40C05E11FC6F25E26240C0EC9E3C2CF440C0C58F31772DE96240C0EC9E3C2CF440C0C58F31772DE96240ED2AA4FCA4FE40C05E11FC6F25E26240ED2AA4FCA4FE40C0'::geometry) AND (date >= '2019-01-01 11:00:00+11'::timestamp with time zone))"
"  Rows Removed by Index Recheck: 173436"
"  Heap Blocks: exact=6673 lossy=7503"
"  Buffers: shared read=18750"
"  ->  BitmapAnd  (cost=1894.90..1894.90 rows=150 width=0) (actual time=108.636..108.637 rows=0 loops=1)"
"        Buffers: shared read=4574"
"        ->  Bitmap Index Scan on geom_idx  (cost=0.00..237.10 rows=5425 width=0) (actual time=81.729..81.729 rows=261632 loops=1)"
"              Index Cond: (geom @ '0103000020E610000001000000050000005E11FC6F25E26240ED2AA4FCA4FE40C05E11FC6F25E26240C0EC9E3C2CF440C0C58F31772DE96240C0EC9E3C2CF440C0C58F31772DE96240ED2AA4FCA4FE40C05E11FC6F25E26240ED2AA4FCA4FE40C0'::geometry)"
"              Buffers: shared read=4106"
"        ->  Bitmap Index Scan on date_idx  (cost=0.00..1657.47 rows=150005 width=0) (actual time=21.662..21.662 rows=530545 loops=1)"
"              Index Cond: (date >= '2019-01-01 11:00:00+11'::timestamp with time zone)"
"              Buffers: shared read=468"
"Planning Time: 0.164 ms"
"Execution Time: 227.673 ms"

select column_name, data_type, character_maximum_length from INFORMATION_SCHEMA.COLUMNS, unsure why spatial field is "USER-DEFINED". Other cols are integer, real, text, and double precision:

"column_name"   "data_type"
"geom"          "USER-DEFINED"
"date"          "timestamp without time zone"

Solution

  • Your table says the column is "timestamp without time zone", but your query has the value specified as "timestamp with time zone". In my hands, this type mismatch alone is enough to dissuade it from using the multi-column index. Of course I don't have access to your real dataset, so I don't know what would happen on that.