Search code examples
sqlpostgresqlaggregate-functionswindow-functionstimescaledb

Forward fill NULL values in multiple columns


I have a table which contains some time-series data.

             time              |  bid   |  ask   
-------------------------------+--------+--------
 2018-12-27 01:04:06.978456+00 | 1.7086 |       
 2018-12-27 01:04:07.006461+00 | 1.7087 |       
 2018-12-27 01:04:07.021961+00 |        | 1.7106
 2018-12-27 01:04:08.882591+00 | 1.7025 | 1.7156
 2018-12-27 01:04:09.374118+00 |        | 1.7106
 2018-12-27 01:04:09.39018+00  | 1.7087 | 1.7156      
 2018-12-27 01:04:15.793528+00 | 1.7045 | 
 2018-12-27 01:04:15.833545+00 | 1.7083 |       
 2018-12-27 01:04:15.893536+00 |        | 1.7096
 2018-12-27 01:04:16.258062+00 | 1.7045 | 1.7095
 2018-12-27 01:04:16.653573+00 | 1.7046 | 1.7148
 2018-12-27 01:04:16.665564+00 |        | 1.7097

I would like to forward-fill over NULL values, so that the result of my query looks like this:

             time              |  bid   |  ask   
-------------------------------+--------+--------
 2018-12-27 01:04:06.978456+00 | 1.7086 |       
 2018-12-27 01:04:07.006461+00 | 1.7087 |       
 2018-12-27 01:04:07.021961+00 | 1.7087 | 1.7106
 2018-12-27 01:04:08.882591+00 | 1.7025 | 1.7156
 2018-12-27 01:04:09.374118+00 | 1.7025 | 1.7106
 2018-12-27 01:04:09.39018+00  | 1.7087 | 1.7156      
 2018-12-27 01:04:15.793528+00 | 1.7045 | 1.7156
 2018-12-27 01:04:15.833545+00 | 1.7083 | 1.7156      
 2018-12-27 01:04:15.893536+00 | 1.7083 | 1.7096
 2018-12-27 01:04:16.258062+00 | 1.7045 | 1.7095
 2018-12-27 01:04:16.653573+00 | 1.7046 | 1.7148
 2018-12-27 01:04:16.665564+00 | 1.7046 | 1.7097

How can I achieve this?

I am using postgresql 10, with the timescaledb extension


Solution

  • You can do this with a couple of window functions. In a subquery, we'll use count to count the rows, excluding nulls, up to the current row, ordered by time, which will let us figure out the separate groups. From there, we can just use the first_value of that group, if it doesn't already have a value.

    select t,
           coalesce(bid, first_value(bid) OVER (partition by bid_group ORDER BY t)) as bid_filled,
           coalesce(ask, first_value(ask) OVER (partition by ask_group ORDER BY t)) as ask_filled
    FROM (
      select t, ask, bid,
             count(bid) OVER (order by t) as bid_group,
             count(ask) OVER (order by t) as ask_group
      FROM test
    ) sub;
                 t              | bid_filled | ask_filled
    ----------------------------+------------+------------
     2018-12-27 01:04:06.978456 |     1.7086 |
     2018-12-27 01:04:07.006461 |     1.7087 |
     2018-12-27 01:04:07.021961 |     1.7087 |     1.7106
     2018-12-27 01:04:08.882591 |     1.7025 |     1.7156
     2018-12-27 01:04:09.374118 |     1.7025 |     1.7106
     2018-12-27 01:04:09.39018  |     1.7087 |     1.7156
     2018-12-27 01:04:15.793528 |     1.7045 |     1.7156
     2018-12-27 01:04:15.833545 |     1.7083 |     1.7156
     2018-12-27 01:04:15.893536 |     1.7083 |     1.7096
     2018-12-27 01:04:16.258062 |     1.7045 |     1.7095
     2018-12-27 01:04:16.653573 |     1.7046 |     1.7148
     2018-12-27 01:04:16.665564 |     1.7046 |     1.7097