Search code examples
sqlpostgresqltimestamprank

Postgresql compare consecutive rows and insert identical row if there are no values


I have a table that gives me data every 15 minute and I need that range of time. I noticed that sometimes I don't have data for 3/4 hours but I need to duplicate the last row available with the missing timestamp.

Example:

product_id total_revenue timestamp
1 50 01-01-2021 00:00:00
2 17 01-01-2021 00:00:00
3 30 01-01-2021 00:00:00
1 67 01-01-2021 00:15:00
2 31 01-01-2021 00:15:00
1 67 01-01-2021 00:30:00
2 31 01-01-2021 00:30:00
3 33 01-01-2021 00:30:00

But I need an output like:

product_id total_revenue timestamp
1 50 01-01-2021 00:00:00
2 17 01-01-2021 00:00:00
3 30 01-01-2021 00:00:00
1 67 01-01-2021 00:15:00
2 31 01-01-2021 00:15:00
3 30 01-01-2021 00:15:00
1 67 01-01-2021 00:30:00
2 31 01-01-2021 00:30:00
3 33 01-01-2021 00:30:00

I have a select statement like:

select product_id,total_revenue,timestamp from revenue

(I calculate the difference between two consecutive rows too).

Does anybody know how to help me?


Solution

  • One method uses generate_series() and lead():

    with tt as (
          select product_id, total_revenue, timestamp,
                 lead(timestamp) over (partition by product_id order by timestamp) as next_timestamp
          from t
         )
    select tt.product_id, coalesce(gs.ts, tt.timestamp),
           tt.total_revenue
    from tt left join lateral
         generate_series(timestamp, next_timestamp - interval '15 minute', interval '15 minute') gs(ts);
    

    Note: My guess is that you also want this extended to the most recent timestamp in the table:

    with tt as (
          select product_id, total_revenue, timestamp,
                 lead(timestamp) over (partition by product_id order by timestamp) as next_timestamp,
                 max(timestamp) over () as max_timestamp
          from t
         )
    select tt.product_id, coalesce(gs.ts, tt.timestamp),
           tt.total_revenue
    from tt left join lateral
         generate_series(timestamp,
                         coalesce(next_timestamp - interval '15 minute', max_timestamp),
                         interval '15 minute'
                        ) gs(ts);
    

    Also, if the timestamps are not exactly at 15 minute intervals, then I would suggest that you ask a new question with explanation and more realistic sample data.