Search code examples
sqlamazon-web-servicesamazon-redshiftmetabase

How can i group rows on sql base on condition


I am using redshift sql and would like to group users who has overlapping voucher period into a single row instead (showing the minimum start date and max end date)

For E.g if i have these records,

enter image description here

I would like to achieve this result using redshift

enter image description here

Explanation is tat since row 1 and row 2 has overlapping dates, I would like to just combine them together and get the min(Start_date) and max(End_Date)

I do not really know where to start. Tried using row_number to partition them but does not seem to work well. This is what I tried.

select 
    id, 
    start_date, 
    end_date, 
    lag(end_date, 1) over (partition by id order by start_date) as prev_end_date,
    row_number() over (partition by id, (case when prev_end_date >= start_date then 1 else 0) order by start_date) as rn
from users

Are there any suggestions out there? Thank you kind sirs.


Solution

  • This is a type of gaps-and-islands problem. Because the dates are arbitrary, let me suggest the following approach:

    • Use a cumulative max to get the maximum end_date before the current date.
    • Use logic to determine when there is no overall (i.e. a new period starts).
    • A cumulative sum of the starts provides an identifier for the group.
    • Then aggregate.

    As SQL:

    select id, min(start_date), max(end_date)
    from (select u.*,
                 sum(case when prev_end_date >= start_date then 0 else 1
                     end) over (partition by id
                                order by start_date, voucher_code
                                rows between unbounded preceding and current row
                               ) as grp
          from (select u.*,
                       max(end_date) over (partition by id
                                           order by start_date, voucher_code
                                           rows between unbounded preceding and 1 preceding
                                          ) as prev_end_date                            
                from users u
               ) u
          ) u
    group by id, grp;