I have a large spreadsheet with headers (180k+ rows) with unique IDs in A, start date in B, and end date in C. There are multiple rows per ID and the start and end dates overlap.
I need to find any gaps in the date ranges for each ID. I've written a few different formulas and macros, tried and tweaked VBA scripts I've found. I've attempted a power query and power pivot grasping at straws, but if Excel doesn't crash I'm not getting a usable output.
Example data:
ID | start | end |
---|---|---|
100 | 1/1/2015 | 3/1/2015 |
100 | 3/1/2015 | 1/1/2300 |
100 | 1/1/2018 | 1/1/2019 |
096 | 7/1/2020 | 1/1/2021 |
182 | 9/17/2017 | 1/1/2018 |
182 | 1/1/2018 | 1/1/2019 |
607 | 1/1/2015 | 9/1/2015 |
607 | 9/1/2015 | 1/1/2017 |
607 | 1/1/2018 | 1/1/2020 |
607 | 1/1/2021 | 1/1/2300 |
I would like to combine or consolidate these to remove extra lines for the IDs that do not have any gaps in the date range, but will leave an extra row for the IDs that do:
ID | start | end |
---|---|---|
100 | 1/1/2015 | 1/1/2300 |
096 | 7/1/2020 | 1/1/2021 |
182 | 9/17/2017 | 1/1/2019 |
607 | 1/1/2015 | 1/1/2017 |
607 | 1/1/2018 | 1/1/2020 |
607 | 1/1/2021 | 1/1/2300 |
I don't need it to combine; though, for presentations sake it would be nice. Also, I would settle for something that is able to tell me which IDs have a gap in the range, even if it doesn't combine the dates or remove extra rows.
I did find a script from another site that almost did the job, though because the date ranges can't all be sorted in proper order, like ID 100 in the example, it creates an extra line when it shouldn't.
Sub Consolidate_Dates()
Dim cell As Range
Dim Nextrow As Long
Dim Startdate As Date
Nextrow = Range("A" & Rows.Count).End(xlUp).Row + 2
Startdate = Range("B2").Value
Application.ScreenUpdating = False
For Each cell In Range("A2", Range("A2").End(xlDown))
If cell.Value <> cell.Offset(1).Value Or _
cell.Offset(0, 2).Value < cell.Offset(1, 1).Value - 1 Then
Range("A" & Nextrow).Resize(1, 3).Value = cell.Resize(1, 3).Value
Range("B" & Nextrow).Value = Startdate
Nextrow = Nextrow + 1
Startdate = cell.Offset(1, 1).Value
End If
Next cell
Application.ScreenUpdating = True
End sub
Here is a Power Query solution:
Please read the comments in the code and explore the applied steps window to understand the algorithm better, but:
note that many steps cannot be done from the UI
M Code
Paste into the Advanced Editor
ensure you change the table name in Line2 to your actual table name
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"start", type date}, {"end", type date}}),
//Turn each date range into a list
#"Added Custom" = Table.AddColumn(#"Changed Type", "dateList", each
List.Dates([start],
Duration.Days([end]-[start])+1,
#duration(1,0,0,0))),
//Group the rows by ID
/*Generate columns where
actual date ranges are combined into a list,
and a list of the Full date range for that ID*/
#"Grouped Rows" = Table.Group(#"Added Custom", {"ID"},
{{"All", each _, type table [ID=nullable number, start=nullable date, end=nullable date, dateList=list]},
{"combinedDates", each List.Distinct(List.Combine([dateList]))},
{"startToEnd", each List.Dates(List.Min([start]),
Duration.Days(List.Max([end])-List.Min([start]))+1,
#duration(1,0,0,0))}
}),
//if the full list and the combined list Match, then there are no gaps and return True else False
#"Added Custom1" = Table.AddColumn(#"Grouped Rows",
"Custom", each List.IsEmpty(List.Difference([startToEnd],[combinedDates]))),
#"Added Custom2" = Table.AddColumn(#"Added Custom1",
"start", each if [Custom] = false then null
else List.Min([combinedDates])),
#"Added Custom3" = Table.AddColumn(#"Added Custom2",
"end", each if [Custom] = false then null
else List.Max([combinedDates])),
//create the table of Trues which we will NOT expand
trueTbl = Table.SelectRows(#"Added Custom3", each [Custom] = true),
trueRemoveColumns = Table.RemoveColumns(trueTbl,
{"All", "combinedDates", "startToEnd","Custom"}),
trueTyped = Table.TransformColumnTypes(trueRemoveColumns,
{{"start", type date}, {"end", type date}}),
//create the table of False which we WILL expand
falseTbl = Table.SelectRows(#"Added Custom3", each [Custom] = false),
expandFalse = Table.ExpandTableColumn(falseTbl, "All",
{"start", "end"}, {"start.1", "end.1"}),
falseRemoveColumns = Table.RemoveColumns(expandFalse,
{"combinedDates", "startToEnd", "Custom", "start", "end"}),
falseRenameColumns = Table.RenameColumns(falseRemoveColumns,
{{"start.1", "start"}, {"end.1", "end"}}),
//Combine the tables
comb = Table.Combine({trueTyped, falseRenameColumns})
in
comb