Search code examples
postgresqljsonb

Optimizing Postgres JSONB query with not null constraint


I've got a Postgres 9.4.4 database with 1.7 million records with the following information stored in a JSONB column called data in a table called accounts:

data: {
  "lastUpdatedTime": "2016-12-26T12:09:43.901Z",
  "UID": "2c5bb7fd-1a00-4988-8d92-ffaa52ebc20d",
  "data": {
    "country": "UK",
    "verified_at": "2017-01-01T23:49:10.217Z"
  }
}

The data format cannot be changed since this is legacy information.

I need to obtain all accounts where the country is UK, the verified_at value is not null and the lastUpdatedTime value is greater than some given value.

So far, I have the following query:

SELECT * FROM "accounts"
WHERE (data @> '{ "data": { "country": "UK" } }')
AND (data->'data' ? 'verified_at')
AND ((data->'data' ->> 'verified_at') is not null)
AND (data ->>'lastUpdatedTime' > '2016-02-28T05:49:08.511846')
ORDER BY data ->>'lastUpdatedTime' LIMIT 100 OFFSET 0;

And the following indexes:

"accounts_idxgin" gin (data)
"accounts_idxgin_on_data" gin ((data -> 'data'::text))

I've managed to get the query time down to about 1000 to 4000ms

Here is the analyze from the query:

 Bitmap Heap Scan on accounts  (cost=41.31..6934.50 rows=9 width=1719)
                               (actual time=7.273..1067.657 rows=23190 loops=1)
   Recheck Cond: ((data -> 'data'::text) ? 'verified_at'::text)
   Filter: ((((data -> 'data'::text) ->> 'verified_at'::text) IS NOT NULL)
           AND ((data ->> 'lastUpdatedTime'::text) > '2016-02-01 05:49:08.511846'::text)
           AND (((data -> 'data'::text) ->> 'country'::text) = 'UK'::text))
   Rows Removed by Filter: 4
   Heap Blocks: exact=16039
   ->  Bitmap Index Scan on accounts_idxgin_on_data  (cost=0.00..41.30 rows=1773 width=0)
       (actual time=4.618..4.618 rows=23194 loops=1)
         Index Cond: ((data -> 'data'::text) ? 'verified_at'::text)
 Planning time: 0.448 ms
 Execution time: 1069.344 ms
(9 rows)

I have the following questions

  1. Is there anything I can do to further speed up this query?
  2. What is the correct way to speed up a field is not null query with JSONB? I ended up using the existence operator with (data->'data' ? 'verified_at') to filter out a large number of non-matching records, because much of my data doesn't have verified_at as a top level key. This increased the speed of the query, but I'm wondering if there's a general approach to optimizing this type of query.
  3. In order to use the existence operator with (data->'data' ? 'verified_at'), I needed to add another index on ((data -> 'data'::text)). I already had an index on gin (data), but the existence operator didn't use this. Why is that? I thought the existence and containment operators would use this index.

Solution

  • After playing around a bit more, I've managed to reduce my query time from around 1000ms to 350ms by creating the following partial index:

     CREATE INDEX index_accounts_partial_on_verified_at
     ON accounts ((data->'data'->'verified_at'))
     WHERE (data->'data'->>'verified_at') IS NOT NULL
     AND (data->'data' ? 'verified_at')
     AND (data->'data'->>'country' = 'UK');
    

    I was able to hardcode some of the values in this index, such as country=UK because I only need to consider UK accounts for this query. I was also able to remove the index on ((data->'data')) which was 258MB, and replace it with the partial index which is only 1360 kB!

    For anyone interested, I found the details for building a partial JSONB index from here