Search code examples
sqlgoogle-bigquerylagskip

Skip specific rows using LAG in sql


I have a table that looks like this:

enter image description here

Using the LAG function in SQL, I would like to perform the LAG on only values where star_date=end_date and get the past previous start_date record where start_date=end_date. That my end table will have an extra column like this: enter image description here

I hope my question is clear, any help is appreciated.


Solution

  • You can assign a group to these values and use that:

    select t.*,
           (case when start_date = end_date
                 then lag(start_date) over (partition by (case when start_date = end_date then 1 else 0 end) order by start_date)
            end) as prev_eq_start_date
    from t;
    

    Or:

    select t.*,
           (case when start_date = end_date
                 then lag(start_date) over (partition by start_date = end_date order by start_date)
            end) as prev_eq_start_date
    from t;
    

    Note if you data is big and most rows have different dates, then you might have a resources issue. In this case, an additional, unused partition by key can help:

    select t.*,
           (case when start_date = end_date
                 then lag(start_date) over (partition by (case when start_date = end_date then 1 else 2 end), (case when start_date <> end_date then start_date end) order by start_date)
            end) as prev_eq_start_date
    from t;
    

    This has no impact on the result but it can avoid a resources error caused by too many rows with different values.