Postgres JSONB object filtering

I have a Postgres table that looks like this:

CREATE TABLE products(
   id string,
   attributes jsonb,
   PRIMARY KEY(id)
);

An example of attributes field could be like:

 {
      "endDate": [
        "2024-02-20T21:00:00.000Z"
      ],
      "countries": [
        "US","IT","ES"
      ],
      "type": [
        "RETAIL"
      ],
      "startDate": [
        "2024-02-13T08:00:00.000Z"
      ],
      "categories": [
        "ELECTRONICS"
      ],
      "currency": [
        "USD"
      ],
      "status": [
        "ACTIVE"
      ]
    }

I need to create filter where as an input I might get a list of countries e.g. ["US","MX"] for which each product having one of these countries would match criteria and/or where startDate might be after certain provided date. I'm also interested in performance of such query, since this table might be really large.

I've tried sto filter products by countries using this query:

SELECT *
FROM products
WHERE
  (attributes @> '{ "countries": ["US","MX","JP"] }')

But this query would list only products which have all 3 provided countries. I need to have at least one country match, and in this example, it has "US" so it should be match.

Solution

It would be trivial on a normalised structure, but until you find time and budget to change it, you can use jsonb JSONPath expressions with @@ predicate check operator: demo

select * from products
where attributes 
@@ '  exists($.countries[*] 
             ?(  @=="US"
               ||@=="MX" ))
    && exists($.startDate?
              (@>"2024-02-12T07:00:00.000Z"))'
;

id	attributes
1	{"type": ["RETAIL"], "status": ["ACTIVE"], "endDate": ["2024-02-20T21:00:00.000Z"], "currency": ["USD"], "countries": ["US", "IT", "ES"], "startDate": ["2024-02-13T08:00:00.000Z"], "categories": ["ELECTRONICS"]}

If you pair it with a GIN index:

create index on products using gin(attributes jsonb_path_ops);

You'll see @@ speed up significantly thanks to index scans:


Bitmap Heap Scan on public.products (cost=116.51..2432.07 rows=13325 width=396) (actual time=4.505..56.491 rows=9513 loops=1)
Output: id, attributes
Recheck Cond: (products.attributes @@ '(exists ($."countries"[*]?(@ == "US" \|\| @ == "MX")) && exists ($."startDate"?(@ > "2024-02-12T07:00:00.000Z")))'::jsonpath)
Rows Removed by Index Recheck: 28518
Heap Blocks: exact=2149
-> Bitmap Index Scan on products_attributes_idx (cost=0.00..113.18 rows=13325 width=0) (actual time=4.238..4.239 rows=38031 loops=1)
Index Cond: (products.attributes @@ '(exists ($."countries"[*]?(@ == "US" \|\| @ == "MX")) && exists ($."startDate"?(@ > "2024-02-12T07:00:00.000Z")))'::jsonpath)
Planning Time: 0.374 ms
Execution Time: 56.930 ms

That's a test on 40k random sample resembling yours. It's also visible that not the entire JSONPath is supported by the index, hence the re-check:

For these operators [@@, @?], a GIN index extracts clauses of the form accessors_chain = constant out of the jsonpath pattern, and does the index search based on the keys and values mentioned in these clauses. The accessors chain may include .key, [*], and [index] accessors. The jsonb_ops operator class also supports .* and .** accessors, but the jsonb_path_ops operator class does not.

The part with date comparison using > doesn’t qualify, which is why a recheck is necessary. Searches where you don’t need the > should be faster.

If you're dealing with non-uniform timestamp formats, you might want to add a .datetime() method in there.