Search code examples
excelvbaexcel-formuladate-range

Combining overlapping dates and finding non-overlaps for multiple unique IDs


I have a large spreadsheet with headers (180k+ rows) with unique IDs in A, start date in B, and end date in C. There are multiple rows per ID and the start and end dates overlap.

I need to find any gaps in the date ranges for each ID. I've written a few different formulas and macros, tried and tweaked VBA scripts I've found. I've attempted a power query and power pivot grasping at straws, but if Excel doesn't crash I'm not getting a usable output.

Example data:

ID start end
100 1/1/2015 3/1/2015
100 3/1/2015 1/1/2300
100 1/1/2018 1/1/2019
096 7/1/2020 1/1/2021
182 9/17/2017 1/1/2018
182 1/1/2018 1/1/2019
607 1/1/2015 9/1/2015
607 9/1/2015 1/1/2017
607 1/1/2018 1/1/2020
607 1/1/2021 1/1/2300

I would like to combine or consolidate these to remove extra lines for the IDs that do not have any gaps in the date range, but will leave an extra row for the IDs that do:

ID start end
100 1/1/2015 1/1/2300
096 7/1/2020 1/1/2021
182 9/17/2017 1/1/2019
607 1/1/2015 1/1/2017
607 1/1/2018 1/1/2020
607 1/1/2021 1/1/2300

I don't need it to combine; though, for presentations sake it would be nice. Also, I would settle for something that is able to tell me which IDs have a gap in the range, even if it doesn't combine the dates or remove extra rows.

I did find a script from another site that almost did the job, though because the date ranges can't all be sorted in proper order, like ID 100 in the example, it creates an extra line when it shouldn't.

Sub Consolidate_Dates()
    
    Dim cell As Range
    Dim Nextrow As Long
    Dim Startdate As Date
    
    Nextrow = Range("A" & Rows.Count).End(xlUp).Row + 2
    Startdate = Range("B2").Value
    
    Application.ScreenUpdating = False
    For Each cell In Range("A2", Range("A2").End(xlDown))
        If cell.Value <> cell.Offset(1).Value Or _
           cell.Offset(0, 2).Value < cell.Offset(1, 1).Value - 1 Then
            Range("A" & Nextrow).Resize(1, 3).Value = cell.Resize(1, 3).Value
            Range("B" & Nextrow).Value = Startdate
            Nextrow = Nextrow + 1
            Startdate = cell.Offset(1, 1).Value
        End If
    Next cell
    Application.ScreenUpdating = True
End sub

Solution

  • Here is a Power Query solution:

    Please read the comments in the code and explore the applied steps window to understand the algorithm better, but:

    • create a List of the included dates in each range for each ID
      • combine them into a single list
    • create a List of ALL possible dates from the earliest date to the latest date for each ID
    • If all the dates in the "ALL" range are included in the combined list, then we have no gaps.
    • Create two separate tables
      • one with a Group for the no gap list
      • a second for the list with gaps which we then expand
    • Append the two tables.

    note that many steps cannot be done from the UI

    M Code

    Paste into the Advanced Editor

    ensure you change the table name in Line2 to your actual table name

    let
        Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"start", type date}, {"end", type date}}),
    
    //Turn each date range into a list
        #"Added Custom" = Table.AddColumn(#"Changed Type", "dateList", each 
            List.Dates([start],
                        Duration.Days([end]-[start])+1,
                        #duration(1,0,0,0))),
    
      //Group the rows by ID
      /*Generate columns where 
          actual date ranges are combined into a list,
          and a list of the Full date range for that ID*/
        #"Grouped Rows" = Table.Group(#"Added Custom", {"ID"}, 
            {{"All", each _, type table [ID=nullable number, start=nullable date, end=nullable date, dateList=list]},
            {"combinedDates", each List.Distinct(List.Combine([dateList]))},
            {"startToEnd", each List.Dates(List.Min([start]),
                                    Duration.Days(List.Max([end])-List.Min([start]))+1,
                                    #duration(1,0,0,0))}        
            }),
    
      //if the full list and the combined list Match, then there are no gaps and return True else False        
        #"Added Custom1" = Table.AddColumn(#"Grouped Rows", 
              "Custom", each List.IsEmpty(List.Difference([startToEnd],[combinedDates]))),
        #"Added Custom2" = Table.AddColumn(#"Added Custom1", 
              "start", each if [Custom] = false then null
                    else List.Min([combinedDates])),
        #"Added Custom3" = Table.AddColumn(#"Added Custom2", 
              "end", each if [Custom] = false then null 
                    else List.Max([combinedDates])),
    
      //create the table of Trues which we will NOT expand
      trueTbl = Table.SelectRows(#"Added Custom3", each [Custom] = true),
        trueRemoveColumns = Table.RemoveColumns(trueTbl,
                {"All", "combinedDates", "startToEnd","Custom"}),
        trueTyped = Table.TransformColumnTypes(trueRemoveColumns,
                {{"start", type date}, {"end", type date}}),
    
       //create the table of False which we WILL expand 
      falseTbl = Table.SelectRows(#"Added Custom3", each [Custom] = false),
        expandFalse = Table.ExpandTableColumn(falseTbl, "All", 
                {"start", "end"}, {"start.1", "end.1"}),
        falseRemoveColumns = Table.RemoveColumns(expandFalse,
                {"combinedDates", "startToEnd", "Custom", "start", "end"}),
        falseRenameColumns = Table.RenameColumns(falseRemoveColumns,
                {{"start.1", "start"}, {"end.1", "end"}}),
    
    //Combine the tables
        comb = Table.Combine({trueTyped, falseRenameColumns})
    in 
       comb
    

    enter image description here