Search code examples
sqlpostgresqlintervalswindow-functionsgaps-and-islands

Calculate interval between boolean column change


I have a table with measurements of weather here is a simplified version of it:

"station_id","measured_at","rainy"
-------------------------------------------------------------------------
"b6b53561-dab5-4b9a-8d28-a7de1e4d1485","2020-01-31 18:35:35.27+00",FALSE
"b6b53561-dab5-4b9a-8d28-a7de1e4d1485","2020-01-31 18:36:33.976+00",FALSE
"b6b53561-dab5-4b9a-8d28-a7de1e4d1485","2020-01-31 18:37:33.864+00",FALSE
"b6b53561-dab5-4b9a-8d28-a7de1e4d1485","2020-01-31 18:38:34.767+00",TRUE
"b6b53561-dab5-4b9a-8d28-a7de1e4d1485","2020-01-31 18:39:36.076+00",TRUE
"b6b53561-dab5-4b9a-8d28-a7de1e4d1485","2020-01-31 18:40:29.776+00",FALSE
"b6b53561-dab5-4b9a-8d28-a7de1e4d1485","2020-01-31 18:41:35.579+00",FALSE
"b6b53561-dab5-4b9a-8d28-a7de1e4d1485","2020-01-31 18:42:34.274+00",TRUE
"b6b53561-dab5-4b9a-8d28-a7de1e4d1485","2020-01-31 18:43:23.842+00",TRUE
"b6b53561-dab5-4b9a-8d28-a7de1e4d1485","2020-01-31 18:44:35.08+00",FALSE

and I need to calculate how much time is been sunny. I know I need to use window functions but I'm stuck. Because I need this interval to be calculated in some range for example for last day. I have been able to make a query to this stage.

SELECT 
    prev.station_id,
    prev.rainy,
    prev.measured_at AS started_at,
    COALESCE(LEAD(prev.measured_at) OVER (ORDER BY prev.measured_at ASC), NOW()) AS ended_at,
    (COALESCE(LEAD(prev.measured_at) OVER (ORDER BY prev.measured_at ASC), NOW()) - prev.measured_at) AS diff
FROM (
    SELECT
        m.station_id,
        m.measured_at,
        m.rainy,
        COALESCE(LEAD(m.rainy) OVER (ORDER BY m.measured_at ASC), m.rainy) AS prev_rainy
    FROM
        z_measurements m
    WHERE m.measured_at >= '2020-01-30T00:00:00.000Z'
    ORDER BY m.measured_at ASC
) prev
WHERE prev.rainy IS DISTINCT FROM prev.prev_rainy
ORDER BY prev.measured_at ASC;

this query results in to:

"station_id","rainy","started_at","ended_at","diff"
---------------------------------------------------
"b6b53561-dab5-4b9a-8d28-a7de1e4d1485",FALSE,"2020-01-31 18:37:33.864","2020-01-31 18:39:36.076+00","00:02:02.212"
"b6b53561-dab5-4b9a-8d28-a7de1e4d1485",TRUE,"2020-01-31 18:39:36.076","2020-01-31 18:41:35.579+00","00:01:59.503"
"b6b53561-dab5-4b9a-8d28-a7de1e4d1485",FALSE,"2020-01-31 18:41:35.579","2020-01-31 18:43:23.842+00","00:01:48.263"
"b6b53561-dab5-4b9a-8d28-a7de1e4d1485",TRUE,"2020-01-31 18:43:23.842","2020-01-31 21:18:04.89333+00","02:34:41.05133"

but if the first lane of this result is rainy = False the start_date should be the same as one in query 2020-01-30T00:00:00.000Z (I'm assuming that if it's sunny on the first result it has been sunny for whole time between my range start and this first recorded measurement) and also there is missing the last row because last measured state is sunny. So started_at for this last row should be 2020-01-31 18:44:35.08+00 and end_at should be NOW().

Can somebody please help me?

I'm using postgresql 12.1.


Solution

  • SELECT t.station_id, t.rainy, t.started_at, t.ended_at, t.ended_at - t.started_at AS diff
    FROM (
        SELECT
            prev.station_id,
            prev.rainy,
            CASE
                WHEN LAG(prev.measured_at) OVER measured_at_by_station_id IS NULL THEN '2020-01-30T00:00:00.000Z'
                ELSE prev.measured_at
            END AS started_at,
            LEAD(prev.measured_at, 1, NOW()) OVER measured_at_by_station_id AS ended_at
        FROM (
            SELECT
                m.station_id,
                m.measured_at,
                m.rainy,
                LAG(m.rainy, 1, NOT(m.rainy)) OVER (PARTITION BY m.station_id ORDER BY m.measured_at ASC) AS prev_rainy
            FROM z_measurements m
            WHERE m.measured_at >= '2020-01-30T00:00:00.000Z'
            ORDER BY m.station_id ASC, m.measured_at ASC
        ) prev
        WHERE prev.rainy IS DISTINCT FROM prev.prev_rainy
        WINDOW measured_at_by_station_id AS (PARTITION BY prev.station_id ORDER BY prev.measured_at ASC)
    ) t
    ORDER BY t.station_id ASC, t.started_at ASC