Search code examples
sqlsqlitewindow-functions

SQL Window function to get 2 closest events


I was trying to solve the "analyze weather patterns" problem as described here (https://joins-238123.netlify.com/window-functions/)

You're worried that hurricanes are happening more frequently, so you decide to do a tiny bit of analysis. For each kind of weather event find the 2 events that occurred the closest together and when they happened

Table weather with data like:

type    day
rain    6
rain    12
thunderstorm    13
rain    21
rain    27
rain    37
rain    44
rain    54
thunderstorm    56
rain    58
rain    61
rain    65
rain    68
rain    73
rain    82
hurricane   87
rain    92
rain    95
rain    98
rain    108
thunderstorm    111
rain    118
rain    123
rain    128
rain    131
hurricane   135
rain    136
rain    140
rain    149
thunderstorm    158
rain    159
rain    167
rain    175
hurricane   178
rain    179
rain    186
rain    192
rain    200
thunderstorm    202
rain    210
rain    219
thunderstorm    222
rain    226
rain    232
thunderstorm    238
rain    241
rain    246
rain    253
thunderstorm    257
rain    257
rain    267
rain    277
rain    286
rain    295
rain    302
rain    307
thunderstorm    312
rain    316
rain    325
thunderstorm    330

I could come up with :

select type, day, COALESCE(day - LAG(day, 1) over (partition by type order by day), 0) as days_since_previous from weather

It gives me results like:

type        day days_since_previous
hurricane   87  0
hurricane   135 48
hurricane   178 43
rain        6   0
rain        12  6
rain        21  9
rain        27  6

But I can't get it to narrow the results down to the 2 closest events and only display the days between them.

How do I go about doing so that I get the desired result like:

type           day  days_since_previous
rain            61  3
hurricane       178 43
thunderstorm    238 16

Solution

  • You can use another window function to widdle down the rows:

    SELECT type, day, days_since_previous
    FROM (
      SELECT type, day, (day - prev_day) AS days_since_previous,
        ROW_NUMBER() OVER(PARTITION BY type ORDER BY (day - prev_day)) AS RowNum
      FROM (
        select type, day,
          LAG(day, 1) over (partition by type order by day) as prev_day
        from weather
      ) src
      WHERE prev_day IS NOT NULL -- Ignore "first" events
    ) src
    WHERE RowNum = 1
    order by day
    

    I also removed the COALESCE since that was causing the "first" events to be included in the calculations.