Search code examples
sqlpostgresqlgroup-byleft-join

How to get aggregated results that found inside WHERE id IN fields?


I need to get list of transactions with list of logs and tags inside each transaction and the most important make search by field in logs and tags and unite it all. Example below is search by field1 and tag1. How to do that? Also important: I have 500m+ rows. If using WHERE id IN .. will it be slow?

transactions
 id int
 hash var

id | hash 
------------
1  |  h1
2  |  h2
3  |  h3
logs
 transaction_id int
 field1 var
 value var

transaction_id | field1 | value 
-------------------------------
1              | f1     | v1
1              | f2     | v2
2              | f3     | v3
3              | f4     | v4
tags
 transaction_id int
 tag1 var
 value var

transaction_id | tag1   | value 
-------------------------------
1              | t1     | v1
2              | t2     | v2
2              | t3     | v3

I need result:

id | hash | logs_array                                                         | tags_array
------------------------------------------------------------------------------------------
1  | h1   | [{'field1': 'f1', 'value': 'v1'}, {'field1': 'f2', 'value': 'v2'}] | [{'tag1': 't1', 'value': 'v1'}]
2  | h2   | [{'field1': 'f3', 'value': 'v3'}]                                  | [{'tag1': 't2', 'value': 'v2'}, {'tag1': 't3', 'value': 'v3'}]
3  | h3   | [{'field1': 'f4', 'value': 'v4'}]                                  | []

Example template I can think of:

SELECT t.id, t.hash, array_agg(..) FROM transactions t WHERE id in
    (
        SELECT transaction_id FROM logs WHERE field1={any_string}
    )
    OR id in
    (
        SELECT transaction_id FROM tags WHERE tag1={any_string}
    )
LEFT JOIN logs lo ON t.id = lo.transaction_id
LEFT JOIN tags ta ON t.id = ta.transaction_id

Solution

  • That could be accomplished with a query like this

    select t.id, t.hash, 
    array_agg (distinct jsonb_strip_nulls(jsonb_build_object('field1', l.field1, 'value', l."value"))) logs_array,
    array_agg (distinct jsonb_strip_nulls(jsonb_build_object('tag1', tg.tag1, 'value', tg."value"))) tags_array
    from transactions t 
    left join logs l on t.id = l.transaction_id 
    left join tags tg on t.id = tg.transaction_id
    group by t.id, t.hash 
    order by t.id;
    

    Another version using conditions and stripping nulls

    select t.id, t.hash, 
    array_agg (distinct jsonb_strip_nulls(jsonb_build_object('field1', l.field1, 'value', l."value"))) logs_array,
    array_agg (distinct jsonb_strip_nulls(jsonb_build_object('tag1', tg.tag1, 'value', tg."value"))) tags_array
    from transactions t 
    left join logs l on t.id = l.transaction_id 
    left join tags tg on t.id = tg.transaction_id
    where l.field1 = 'f1' and tg.tag1 = 't1'
    group by t.id, t.hash
    order by t.id;
    

    another version optimized for performance

    with cte as 
     (  select transaction_id id, 
        array_agg (distinct jsonb_strip_nulls(jsonb_build_object('tag1', tag1, 'value', value))) res_array
        from tags 
        where tag1 = 't1' 
        group by transaction_id
        union all
        select transaction_id id, 
        array_agg (distinct jsonb_strip_nulls(jsonb_build_object('field1', field1, 'value', value))) res_array
        from logs 
        where field1 = 'f1'
        group by transaction_id 
     )
    select t.id, t.hash, array_agg(res_array) res_array from transactions t
    right join cte c on t.id = c.id
    group by t.id, t.hash;
    

    Fiddle to test