Search code examples
sqlpostgresqlwindow-functionsgaps-and-islandssql-view

Is there a way to fill in missing data points with NULL in SQL VIEWs?


I have a table called measurement_stat that takes the following form:

id     sensor_name              timestamp           value_cal 
1   measurement_status    2020-07-28 16:00:00    start_measurement
2   measurement_status    2020-07-28 17:00:00    stop_measurement
3   measurement_status    2020-07-28 18:00:00    start_measurement
4   measurement_status    2020-07-28 19:00:00    stop_measurement
5   measurement_status    2020-07-28 20:00:00    start_measurement
...          

I want to create a VIEW that returns this:

id      start_measurement      stop_measurement
1      2020-07-28 16:00:00    2020-07-28 17:00:00
2      2020-07-28 18:00:00    2020-07-28 19:00:00
...

How do I do this? I have tried doing this:

CREATE VIEW start_times AS 
   SELECT timestamp AS start_measurements FROM measurement_stat
      WHERE sensor_name = 'measurement_stat' AND value_cal = 'start_measurement'

Then I do the same for stop_times. Finally I perform a LEFT OUTER JOIN on the two VIEWs that I just created.

This works but, it has a problem. Occasionally while taking data if something goes wrong, we have to stop the measurement and sometimes 'stop_measurement' doesn't log. So there is a start_measurement with no associated stop_measurement. This issue throws my VIEW off and the columns become out of sync. If this starts happening often, then problem just compounds and gets even worse.

Is there a way of handling this in SQL? It would be great if I could throw in NULLs in place of 'stop_measurement' if there are 2 consecutive 'starts' without a 'stop' in between


Solution

  • You can use window functions. The idea is to start a new group every time a 'start_measurement' row is met for a given sensor. You can then use this information to group the data:

    create view v_measurement_stat as
    select
        sensor_name,
        min(timestamp) start_measurement,
        max(timestamp) filter(where value_cal = 'stop_measurement') stop_measurement
    from (
        select 
            ms.*,
            count(*) filter(where value_cal = 'start_measurement')
                over(partition by sensor_name order by timestamp) grp
        from measurement_stat ms
    ) t
    group by sensor_name, grp
    order by sensor_name, min(timestamp)