I am using PostgreSql 9.6. (please don't ask me to upgrade - I must use 9.6)
I have a table which has a jsonb column. I have created a GIN index on this column. The table has 320,000 records. A "explain analyse" reveals that the index is not being used and a simple query takes about 3 seconds.
We have a debug logger that records anything at all but stores it as JSON in the format { "key1":"value1", "key2":"value2", ... }
We gather statistics by extracting values for keys.
The table and index are created like so:
CREATE TABLE log (
id SERIAL PRIMARY KEY,
logEntry jsonb
);
CREATE INDEX log_idx_logentry on log using gin (logentry);
I run a query that I know will return no results:
SELECT id FROM log WHERE logentry->>'modality' = 'XT'
This takes 3 seconds to run.
EXPLAIN ANALYSE SELECT id FROM log WHERE logentry->>'modality' = 'XT' produces:
Seq Scan on log (cost=0.00..32458.90 rows=1618 width=4) (actual time=1328.654..1328.660 rows=0 loops=1)
Filter: ((logentry ->> 'modality'::text) = 'XT'::text)
Rows Removed by Filter: 323527
Planning time: 0.450 ms
Execution time: 1328.724 ms
(5 rows)
Similar results if I write the query as:
EXPLAIN ANALYSE SELECT id FROM log WHERE logentry->'modality' @> '"XT"'::jsonb
Seq Scan on log (cost=0.00..32458.90 rows=324 width=4) (actual time=1421.262..1421.266 rows=0 loops=1)
Filter: ((logentry -> 'modality'::text) @> '"XT"'::jsonb)
Rows Removed by Filter: 323527
Planning time: 0.080 ms
Execution time: 1421.309 ms
(5 rows)
And, just to prove that there is something in the table,
SELECT COUNT(id) FROM log WHERE logentry->'modality' @> '"CT"'::jsonb
returns 42528
So why is the index not being used? In production we would expect the log table to contain millions of records.
klin has the correct answer. As the database gets larger, the difference in performance becomes more pronounced.