Search code examples
sqlsql-servert-sqlgaps-and-islands

SQL Island with just one date column and duplicated criteria


I'm coming to ask for help, I have been trying to generate the islands for a data set of records but I wasn't able to do it, I have tried with things like Lag functions, generate row over each record, then generate the islands, but not getting the right result.

Reference of solutions that I've tried: Show Date Range in Custom Column - Gaps and Islands

I need to generate this data structure:

 | ChartShiftWeekdayDetailId | IslandStartsOn | IslandEndsOn | NoLinkingReason |
 |          3600             |   2021-06-25   |   2021-07-01 | 'Can't be .....'|
 |          3600             |   2021-07-02   |   2021-07-14 |       null      |
 |          3600             |   2021-07-15   |   2021-07-31 | 'Can't be .....'|
 |          3600             |   2021-08-01   |   2021-08-05 |       null      |

and this is a example of records that should generate the expected result.

ref dataset

I have achieve this result

enter image description here

But as you can see this not the expected result.

This is what I've tried:

SELECT *,
       CONCAT(NoLinkingReason,
              CAST(DATEDIFF(DAY,
                            LAG(DateValue, 1, CAST(ClearedData.DateValue AS datetime) - 1) OVER (PARTITION BY ChartShiftWeekdayDetailId,
                                                                                                              NoLinkingReason
                                                                                                 ORDER BY DateValue),
                            DateValue) AS varchar(10))) AS NumDays
FROM ClearedData;

I would really appreciate and thankful all your help.


Solution

  • After reviewing the link and other answers it's apparent the gaps haven't been defined correctly. This answer uses some data I made up. You could try something like this

    with 
    gaps_cte as (
        select *, case when lag(NoLinkingReason, 1, NoLinkingReason) 
                                over (partition by ChartShiftWeekdayDetailId order by DateValue)<>NoLinkingReason then 1 
                       when lag(NoLinkingReason, 1, NoLinkingReason) 
                                over (partition by ChartShiftWeekdayDetailId order by DateValue) is null and NoLinkingReason is not null then 1
                       when lag(NoLinkingReason, 1, NoLinkingReason) 
                                over (partition by ChartShiftWeekdayDetailId order by DateValue) is not null and NoLinkingReason is null then 1
                       else 0 end gap
        from (values (3600, cast('2021-06-25' as date), 'Something'),
                     (3600, cast('2021-06-26' as date), 'Something'),
                     (3600, cast('2021-06-27' as date), 'Nothing'),
                     (3600, cast('2021-06-28' as date), 'Nothing'),
                     (3600, cast('2021-06-29' as date), null),
                     (3600, cast('2021-06-30' as date), Null),
                     (3600, cast('2021-07-01' as date), 'Something'),
                     (3600, cast('2021-07-02' as date), 'Something')
                      ) v(ChartShiftWeekdayDetailId, DateValue, NoLinkingReason)),
    grps_cte as (
        select *, sum(gap) over (partition by ChartShiftWeekdayDetailId order by DateValue) grp
        from gaps_cte)
    select ChartShiftWeekdayDetailId, 
           min(DateVAlue) IslandStartsOn,
           max(DateVAlue) IslandEndsOn,
           max(NoLinkingReason) NoLinkingReason
    from grps_cte
    group by ChartShiftWeekdayDetailId, grp
    order by min(DateVAlue);
    
    ChartShiftWeekdayDetailId   IslandStartsOn  IslandEndsOn    NoLinkingReason
    3600                        2021-06-25      2021-06-26      Something
    3600                        2021-06-27      2021-06-28      Nothing
    3600                        2021-06-29      2021-06-30      NULL
    3600                        2021-07-01      2021-07-02      Something