Search code examples
sqldatabasegoogle-bigquerywindow-functionsgaps-and-islands

Query previous rows on a condition


I have a table of data about a user's flight booking patterns on a website. Let's assume the following data is all the historical data I have about my user.

The session_date is the day that the user came onto the website and searched a specific route, while the flight_date is the departure date of the flight. I have ordered the table by the session_date. The outcome is recorded in booked.

+---------+--------------+----------------+--------------+-------------+--------+
| user_id | session_date | departure_code | arrival_code | flight_date | booked |
+---------+--------------+----------------+--------------+-------------+--------+
| user1   | 7 Jan        | CA             | MY           | 8 Mar       |      1 |
| user1   | 8 Jan        | US             | MY           | 18 May      |      0 |
| user1   | 8 Jan        | US             | MY           | 18 May      |      1 |
| user1   | 8 Jan        | CA             | MY           | 19 Mar      |      0 |
| user1   | 9 Jan        | US             | MY           | 18 May      |      1 |
+---------+--------------+----------------+--------------+-------------+--------+

I would like to output a new column in my table, called previous_flight_date. The new column will state with each search, the previous booked flight_date for that specific route. Even if a user has searched that same route multiple times but never booked, the value in this column will be empty.


+-------+--------------+----------------+--------------+-------------+--------+----------------------+
|  _id  | session_date | departure_code | arrival_code | flight_date | booked | previous_flight_date |
+-------+--------------+----------------+--------------+-------------+--------+----------------------+
| user1 | 7 Jan        | CA             | SG           | 8 Mar       |      1 | null                 |
| user1 | 8 Jan        | US             | MY           | 18 May      |      0 | null                 |
| user1 | 8 Jan        | US             | MY           | 18 May      |      1 | null                 |
| user1 | 8 Jan        | CA             | SG           | 19 Mar      |      0 | 8 Mar                |
| user1 | 2 Feb        | US             | MY           | 2 Jul       |      1 | 18 May               |
+-------+--------------+----------------+--------------+-------------+--------+----------------------+

so for example, the column will be null until the 4th row which reflects '8 Mar', as the user had booked a flight from CA-->SG, to depart on that day.

I have tried using LAST_VALUE but it didn't work. I also do not know how I can use LAG() when I have multiple different types of routes, and I want to find previous rows on a condition. Would be great if a solution was suggested! thank you.


Solution

  • I think you can do this with first_value(). The trick is to put a condition within the window function, turn on the ignore nulls option, and then use a window frame specifications that looks back on the previous rows having the same departure/arrival, not including the current row:

    select
        t.*,
        first_value(case when booked = 1 then flight_date end ignore nulls) over(
            partition by departure_code, arrival code
            order by flight_date desc
            rows between unbounded preceding and 1 preceding
        ) previous_flight_date
    from mytable t
    

    Actually a window max() would work too (and then, no need for ignore nulls):

    select
        t.*,
        max(case when booked = 1 then flight_date end) over(
            partition by departure_code, arrival code
            order by flight_date desc
            rows between unbounded preceding and 1 preceding
        ) previous_flight_date
    from mytable t