Search code examples
postgresqljsonb

Postgres jsonb search in array with greater operator (with jsonb_array_elements)


I try to search a solution but I didn't find anything for my case...

Here is the database declaration (simplified):

CREATE TABLE documents (
    document_id int4 NOT NULL GENERATED BY DEFAULT AS IDENTITY,
    data_block jsonb NULL
);

And this is an example of insert.

INSERT INTO documents (document_id, data_block)
VALUES(878979, 
    {"COMMONS": {"DATE": {"value": "2017-03-11"}},
     "PAYABLE_INVOICE_LINES": [
         {"AMOUNT": {"value": 52408.53}}, 
         {"AMOUNT": {"value": 654.23}}
     ]});
INSERT INTO documents (document_id, data_block)
VALUES(977656, 
    {"COMMONS": {"DATE": {"value": "2018-03-11"}},
     "PAYABLE_INVOICE_LINES": [
         {"AMOUNT": {"value": 555.10}}
     ]});

I want to search all documents where one of the PAYABLE_INVOICE_LINES has a line with a value greater than 1000.00

My query is

select *
from documents d
cross join lateral jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') as pil 
where (pil->'AMOUNT'->>'value')::decimal >= 1000

But, as I want to limit to 50 documents, I have to group on the document_id and limit the result to 50.

With millions of documents, this query is very expensive... 10 seconds with 1 million.

Do you have some ideas to have better performance ?

Thanks


Solution

  • Instead of cross join lateral use where exists:

    select *
    from documents d
    where exists (
      select 1
      from jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') as pil
      where (pil->'AMOUNT'->>'value')::decimal >= 1000)
    limit 50;
    

    Update

    And yet another method, more complex but also much more efficient.

    Create function that returns max value from your JSONB data, like this:

    create function fn_get_max_PAYABLE_INVOICE_LINES_value(JSONB) returns decimal language sql as $$
      select max((pil->'AMOUNT'->>'value')::decimal)
      from jsonb_array_elements($1 -> 'PAYABLE_INVOICE_LINES') as pil $$
    

    Create index on this function:

    create index idx_max_PAYABLE_INVOICE_LINES_value
      on documents(fn_get_max_PAYABLE_INVOICE_LINES_value(data_block));
    

    Use function in your query:

    select *
    from documents d
    where fn_get_max_PAYABLE_INVOICE_LINES_value(data_block) > 1000
    limit 50;
    

    In this case the index will be used and query will be much faster on large amount of data.

    PS: Usually limit have sense in pair with order by.