Search code examples
powerbipowerquery

Aggregate overlapping time intervals, calculate outage times, improve performance


I have a very large table, millions of rows, with many columns, amongst them, Service, Start, and End. Service is a name, Start and End indicate outage times.

The table looks something like:

 Service | Start     | End
 LAN     | 1/1 12:00 | 3/1 12:00
 LAN     | 2/1 14:00 | 3/1 14:00
 WAN     | 5/1 10:00 | 7/1 08:00
 WAN     | 6/1 08:00 | 7/1 10:00

The aim is to create an aggregate table listing Service, date and outage length, something like:

 Service | Date      | Outage length
 LAN     | 1/1       | 12 h
 LAN     | 2/1       | 24 h
 LAN     | 3/1       | 14 h
 WAN     | 5/1       | 14 h
 WAN     | 6/1       | 24 h
 WAN     | 7/1       | 10 h

My workflow is as follows:

Step 1. To calculate overlapping times I have used this Stack Overflow solution. It works fine. Based on my input table, the output looks like:

 Service | Start     | End
 LAN     | 1/1 12:00 | 3/1 14:00
 WAN     | 5/1 10:00 | 6/1 10:00

Step 2. I've added a Date column, and a calculated Outage column to get the final result.

It works, big thanks to @horseyride for the original solution, but is it very slow and takes forever to run. I realised that there are a lot of records but I'm willing to bet that there is a better workflow algorithm which could speed up the process dramatically.


Solution

  • You may be better off with the OVERLAP function in powerbi

    If it helps, I think this works considerably (?) faster then the [Original Powerquery Answer][1] by (a) removing half of the initial date comparisons checks (b) grouping and only operating on one group at a time. Add or remove the Table.Buffers as needed, not sure they are helping here

    function process

    (xtable)=>
    // compare each list against all lists in column Custom, and merge those that overlap
    let Source= Table.Buffer(xtable),
    #"Added Custom"= Table.AddColumn( 
         Source,
        "Custom2",
            each List.Accumulate (
             Source[Custom],
             [Custom],
            (state,current)=> if List.ContainsAny(state,current) then List.Distinct(List.Combine({current,state})) else state
        )
    ),
    // count the number of changes made from original version. If this is not zero, we will recurse the changes
    x= List.Sum(List.Transform(List.Positions(#"Added Custom"[Custom]), each if #"Added Custom"[Custom]{_} = #"Added Custom"[Custom2]{_} then 0 else 1)),
    RemovePrioCustom= Table.RemoveColumns(#"Added Custom",{"Custom"}),
    AddNewCustom= Table.RenameColumns(RemovePrioCustom,{{"Custom2", "Custom"}}),
    recursive = if x=0 then AddNewCustom else @process( AddNewCustom)
    in recursive
    

    Query

    let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Start", type datetime}, {"End", type datetime}}),
    #"Grouped Rows" = Table.Group(#"Changed Type" , {"Group"}, {{"data", each 
         let  #"AddIndex" = Table.AddIndexColumn(_, "Index", 0, 1, Int64.Type),
         #"Round1" = Table.AddColumn(#"AddIndex", "Custom",  (i)=>Table.SelectRows(#"AddIndex", each 
             ([End]=null and (i[End]=null or [Start]<=i[End])) or //remove this row if you don't care about null dates to speed things up
              ([Start]>=i[Start] and [Start]<=i[End]) or 
             ([Start]<=i[Start] and [End]>=i[Start])
         )[Index]),
         #"Round2"= process(#"Round1"),
         #"GetStart" = Table.AddColumn(#"Round2", "StartMin", each List.Min(List.Transform([Custom], each #"Round2"[Start]{_})),type datetime),
         #"GetEnd"= Table.AddColumn(#"GetStart", "EndMax", each 
             let a=List.Transform([Custom], each #"GetStart"[End]{_})
             in if List.Count(a)-List.NonNullCount(a) > 0 then null else List.Max(a)),
         #"RemoveColumns" = Table.RemoveColumns(#"GetEnd",{"Start", "End", "Index", "Custom"}),
         #"CleanUp" = Table.Distinct(#"RemoveColumns", {"Group", "StartMin", "EndMax"})
    in #"CleanUp", type table }}),
    #"Expanded data" = Table.ExpandTableColumn(#"Grouped Rows", "data", {"Event ID", "StartMin", "EndMax"}, {"Event ID", "StartMin", "EndMax"})
    in #"Expanded data"