Search code examples
sqlpostgresqlselectoptimizationsubquery

Efficient way to associate each row to latest previous row with condition.(PostgreSQL)


I have a table in which two different kind of rows are inserted:

  • Some rows represent a datapoint, a key-value pair in a specific of time
  • Other rows represent a new status, which persist in the future until the next status

In the real problem, I have a timestamp column which stores the order of the events. In the SQL Fiddle example I am using a SERIAL integer field, but it is the same idea.

Here is the example:

http://www.sqlfiddle.com/#!17/a0823/6

I am looking for an efficent way to retrieve each row of the first type with the its status (which is given by the latest status row before current row) associated.

The query on the sqlfiddle link is an example, but uses two subqueries which is very inefficient.

I cannot change the structure of the table nor create other tables, but I can create any necessary index on the table.

I am using PostgreSQL 11.4


Solution

  • The most efficient method is probably to use window functions:

    select p.*
    from (select p.*,
                 max(attrvalue) filter (where attrname = 'status_t1') over (partition by grp_1 order by id) as status_t1,
                 max(attrvalue) filter (where attrname = 'status_t2') over (partition by grp_2 order by id) as status_t2
          from (select p.*,
                       count(*) filter (where attrname = 'status_t1') over (order by id) as grp_1,
                       count(*) filter (where attrname = 'status_t2') over (order by id) as grp_2
                from people p
               ) p
         ) p
    where attrname not in ('status_t1', 'status_t2');
    

    Here is a db<>fiddle.