Search code examples
sqlprestotrinosubtotal

Calculate subtotal with special condition


id flow amount date
1 IN 100 2023-08-01
2 OUT 100 2023-08-02
3 IN 100 2023-08-03
4 OUT 100 2023-08-04
5 OUT 100 2023-08-05

In a table like above, I want to sum all OUT transactions which take place after every IN transaction. The desired outcome should look like below:

id flow amount date out_amount
1 IN 100 2023-08-01 100
3 IN 100 2023-08-03 200

How could this be achieved?


Solution

  • If you are using Presto then you can follow gaps-and-islands approach (based on provided data I would use id as ordering field, but for actual you might need/want to use date):

    -- sample data
    with dataset(id, flow, amount, date) as (
        values (1,  'IN',   100,    '2023-08-01'),
        (2, 'OUT',  100,    '2023-08-02'),
        (3, 'IN',   100,    '2023-08-03'),
        (4, 'OUT',  100,    '2023-08-04'),
        (5, 'OUT',  100,    '2023-08-05'),
        (6, 'IN',   100,    '2023-08-11')
    )
    
    -- query
    select min(id) id,
        min_by(flow, id) flow,
        min_by(amount, id) amount,
        min_by(date, id) date,
        sum(if(flow = 'OUT', amount)) out
    from (select *,
                 count_if(flow = 'IN') over (order by id) as grp
          from dataset)
    group by grp
    order by id;
    

    Output:

    id flow amount date out
    1 IN 100 2023-08-01 100
    3 IN 100 2023-08-03 200
    6 IN 100 2023-08-11 NULL

    For Trino you can also use MATCH_RECOGNIZE:

    -- query
    SELECT *
    FROM dataset MATCH_RECOGNIZE(
         ORDER BY id
         MEASURES
             A.id as id,
             A.flow as flow,
             A.amount as amount,
             A.date as date,
             SUM(B.amount) AS out
         ONE ROW PER MATCH
         AFTER MATCH SKIP PAST LAST ROW
         PATTERN (A B*)
         DEFINE
                A AS flow = 'IN',
                B AS flow = 'OUT'
         );