Search code examples
postgresqlplpgsqlreduction

Postgresql sequential add/remove reduction operation


I have a table with line numbers and either a "define" or an "undefine" event of an identifier. Example:

line_no | def | undef
--------------------
1       | 'a' | NULL
2       | 'b' | NULL
3       | NULL| 'a'
... 
42      | NULL| 'b'

I want to compute the "live variables" information on every line. Iteratively, I'd just write code like the following pseudocode:

live = [] 
for each r in table:
    if r.def:
        live.append(r.def)
    else:
        live.remove(r.undef)
    store(r.line_no, live) 

The expected result is a table like :

line_no | live
1.      | ['a']
2.      | ['a', 'b'] 
3.      | ['b'] 
... 
42.     | [] 

I managed to write the equivalent sequential loop as a plpgsql function. However, I was wondering if there is a (maybe more elegant) way as a SQL query? I tried various things using lag() but somehow that never resulted in this "reduction" operation I am looking for?

(bonus points if the query can also partition over a field, such as 'filename')


Solution

  • If you want a pure SQL solution, use a recursive query.

    with recursive cte as (
        select line_no, array[def] as list
        from my_table
        where line_no = 1
    union all
        select 
            t.line_no, 
            case when def is null 
                then array_remove(list, undef)
                else array_append(list, def)
            end
        from my_table t
        join cte c on c.line_no = t.line_no- 1
    )
    select *
    from cte;
    

    However, a more efficient and flexible solution in this case may be creating a function.

    create or replace function list_of_vars()
    returns table (line_no int, list text[])
    language plpgsql as $$
    declare
        arr text[];
        rec record;
    begin
        for rec in
            select *
            from my_table
            order by line_no
        loop
            if rec.def is null then
                arr := array_remove(arr, rec.undef);
            else
                arr := array_append(arr, rec.def);
            end if;
            line_no := rec.line_no;
            list := arr;
            return next;
        end loop;
    end
    $$;
    

    Use:

    select *
    from list_of_vars()
    order by line_no
    

    Test it in db<>fiddle.