For searching on specific keys within a jsonb
column, I wanted to create an index on the column.
Using: Postgres 10.2
Ignoring some unrelated columns, I have the table animals
with these columns (omitting some unrelated columns):
animalid PK number
location (text)
type (text)
name (text)
data (jsonb) for eg: {"age": 2, "tagid": 11 }
I would need to search based on: location
, type
and tagId
. Like:
where location = ? and type = 'cat' and (data ->> 'tagid') = ?
Other important points:
How do I make sure that the search is fast? Options I have considered:
animal_id
, location
, tagId
(although FK to the partitioned parent table is not possible)location
, type
and jsonb key.tagId
- which would be null for all animals but cat.I do have an index on other columns on the table - but a bit confused on how to create an index to make the search for cats based on the tagid
fast. Any suggestions?
UPDATE (Ignoring partitions):
(Testing on a partitioned table)
So I decided to go with the option as suggested by Erwin and tried creating an index
CREATE INDEX ON animals_211 (location, ((data->>'tagid')::uuid)) WHERE type = 'cat';
And tried an EXPLAIN on the query (using the partitioned table to keep it simple):
explain select * from animals_211 a
where a.location = 32341
and a.type = 'cat'
and (data->>'tagid')::uuid = '5e54c1d9-3ea0-4bca-81d6-1000d90cc42c'
And from the results, it looks like it doesn't use the index created and does a sequential scan:
Seq Scan on animals_211 e (cost=0.00..121.70 rows=1 width=327) |
Filter: ((location = 32341) AND ((type)::text = 'cat'::text) AND (((data ->> 'tagid'::text))::uuid = '5e54c1d9-3ea0-4bca-81d6-1000d90cc42c'::uuid
UPDATE 2 (not using partial index)
It somehow seems to be the partial index, as without it - it seems to work:
CREATE INDEX tag_id_index ON animals_211 (location, type, ((data->>'tagid')::uuid))
When I do an explain plan:
Index Scan using tag_id_index on animals_211 e (cost=0.28..8.30 rows=1 width=327)
Index Cond: ((location = 32341) AND ((type)::text = 'cat'::text) AND (((data ->> 'tagid'::text))::uuid = '5e54c1d9-3ea0-4bca-81d6-1000d90cc42c'::uuid))
Based on your three "important points" I suggest a partial index on an expression:
CREATE INDEX ON animals ((data->>'tagid'))
WHERE type = 'cat';
Use CREATE INDEX CONCURRENTLY ...
to avoid locking issues with concurrent write access to the same table.
Postgres also gathers specific statistics for the partial index, which help the query planner to get appropriate estimates. Note that you need to run ANALYZE
(or VACUUM ANALYZE
) manually if you test the index right after creation before autovacuum
can kick in. See:
If tagid
is really some other data type than text
, you can also cast the expression to optimize some more. See:
Your update suggests tagid
stores UUID values. Read:
So consider this index instead:
CREATE INDEX ON animals (((data->>'tagid')::uuid)) -- !
WHERE type = 'cat';
The extra set of parentheses around (data->>'tagid')::uuid
is required to make the syntax unambiguous.
And a matching query:
SELECT *
FROM animals
WHERE location = 32341
AND type = 'cats'
AND (data->>'tagid')::uuid = '5e54c1d9-3ea0-4bca-81d6-1000d90cc42c'; -- !
Or - depending on the selectivity of each predicate and which variants of the query are possible - include location
to make it a multicolumn index:
CREATE INDEX ON animals (location, ((data->>'tagid')::uuid))
WHERE type = 'cat';
Or tagid
first if you have queries without filtering on location. See:
Since only relatively few rows are of type 'cat', the index will be relatively small, excluding the bulk of your "millions of rows". And we only ever need the index on tagid
for cats to begin with. Win-win.
If possible break out the json key data->>'tagid'
as dedicated column. (Like you considered as option 3.) Can be null where it's not applicable, null storage is very cheap. Makes storage and index cheaper, yet, and queries a bit simpler.
Postgres 10 does not support indexes on the parent table of a partitioned table. This was added in Postgres 11. Declarative partitioning has been improved a lot since. Consider upgrading to the current version 13 or later.
There is also the option of "old-style" partitioning with inheritance. Then you can have a separate partition for cats with an additional column tagid
only there. The manual:
For declarative partitioning, partitions must have exactly the same set of columns as the partitioned table, whereas with table inheritance, child tables may have extra columns not present in the parent.
Sounds like a perfect fit. But inheritance has fallen out of favor with Postgres, so I would think twice before doing that.
Either way - be it declarative or inheritance - if you have all "cats" in a separate partition a non-partial index does the job, obviously:
CREATE INDEX ON cats (location, ((data->>'tagid')::uuid));
And the query can target the partition cats
instead of the parent table:
SELECT *
FROM cats
WHERE location = 32341
AND (data->>'tagid')::uuid = '5e54c1d9-3ea0-4bca-81d6-1000d90cc42c';
Targeting the parent table should work, too. (Not sure about Postgres 10.)
SELECT *
FROM animals
WHERE type = 'cat'
AND location = 32341
AND (data->>'tagid')::uuid = '5e54c1d9-3ea0-4bca-81d6-1000d90cc42c';
But activate partition pruning for that. The manual:
Note that partition pruning is driven only by the constraints defined implicitly by the partition keys, not by the presence of indexes. Therefore it isn't necessary to define indexes on the key columns.
All other partitions should be pruned, and then you should get an index scan on the cats
partition only ...