Search code examples
powerquerym

power query subtract row below from row above using multiple conditions


I am using Power Query in Excel and I need to calculate the duration on each "Door_side" using the Time column on a daily level for each individual user.

The data comes from a card based access system and is formatted as follows:

Date  Time   User_No   Door_side
03/12  08:59   User_05   Outside
03/12  09:00   User_33   Inside
03/12  09:01   User_10   Outside
03/12  09:01   User_04   Outside
03/12  09:02   User_26   Outside
03/12  09:03   User_19   Outside
03/12  09:03   User_15   Inside
03/12  09:04   User_31   Inside
03/12  09:05   User_31   Outside
03/12  09:06   User_15   Outside
03/12  09:06   User_06   Inside
03/12  09:06   User_06   Inside
03/12  09:06   User_06   Inside
03/12  09:08   User_32   Outside
03/12  09:09   User_10   Inside
03/12  09:09   User_13   Inside
03/12  09:10   User_10   Outside

I tried the following:

  1. Sorted the Rows by Date, User and Time;
  2. Added Index column;
  3. Created Custom column named PreviousTime;
  4. Calculated Duration (Time - PreviousTime).

The full code for the above mentioned steps is:

    let
    Source = Table,
     #"Sorted Rows" = Table.Sort(Source,{{"Date", Order.Ascending}, {"User_No", Order.Ascending}, {"Time", Order.Ascending}}),
    #"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1),
    #"Added Custom" = Table.AddColumn(#"Added Index", "PreviousTime", each try 
if List.AllTrue(
{[User_No]=#"Added Index"[User_No]{[Index]-1},[Date]=#"Added Index"[Date]{[Index]-1}
}
)
then try #"Added Index"[Time]{[Index]-1} otherwise [Time]
else [Time]
otherwise [Time]),
    Duration = Table.AddColumn(#"Added Custom", "Duration", each [Time] - [PreviousTime], type duration)
in
    Duration

This works on small data sets but causes functionality issues and completely fails on a larger amount of data. I am fairly new to Power Query and M so I just can't figure out what exactly from the custom column formula causes issues or how to approach this in another way.

I tried to keep the above code as part of my query and also to use it as a function but there is not much difference functionality wise between these two approaches. The processed table will be sent to the Data Model but I was hoping to obtain the duration in Power Query rather than in Power Pivot. A big thank you in advance!


To detail on the task a bit more I uploaded a reduced version of the data, for 3 users for the month of December. You can find it here: https://1drv.ms/x/s!AocQlL_KAzymgwhqiKxSL5JMZheL.

What I want to achieve is to calculate the duration between the timestamps based on user and date. As a plus I do not have users working past midnight so all timestamps for a specific shift will be within the same date.

An example of the desired outcome can be found within the workbook as well and looks like this (calculated in Excel):

Date    Time    User    Door_side    Duration
03/12   06:54   User_1  Outside 
03/12   07:26   User_1  Inside    00:32:00
03/12   07:27   User_1  Outside   00:01:00
03/12   07:44   User_1  Inside    00:17:00
03/12   07:52   User_1  Outside   00:08:00
03/12   08:35   User_1  Inside    00:43:00
03/12   08:36   User_1  Outside   00:01:00
03/12   11:50   User_1  Inside    03:14:00
03/12   12:01   User_1  Outside   00:11:00
03/12   13:27   User_1  Inside    01:26:00
03/12   13:43   User_1  Outside   00:16:00
03/12   14:57   User_1  Inside    01:14:00
03/12   15:20   User_1  Inside    00:23:00
03/12   15:26   User_1  Outside   00:06:00
03/12   15:34   User_1  Inside    00:08:00

Because the data contains all users and multiple days I am attempting to do the calculations within tables grouped by Date and User.


I spent some time testing all 3 approaches presented below (List.Min, Table.FirstN & nested tables) and on a limited data set all of them do a great job.

However, when applied to a larger dataset (I have around 20000 rows for 1 month) the nested tables approach seems to be the fastest.

Thank you Eugene and Marc for helping and, more important, for teaching me something new.


Solution

  • Here's a different approach. It relies on working in nested tables.

    I started with your data from your spreadsheet, in a table named Table1: enter image description here

    In Power Query, using Table1 as the source, I split the Booking Time column, renamed the resulting date and time columns, filtered, out the - Doorside entries, and sorted per your guidance: enter image description here

    Then I grouped by Booking Date and User: enter image description here enter image description here

    Then I added an index column within each of the nested tables, in a new custom column: enter image description here enter image description here

    Then I added a new column with the previous time within each of the nested tables, in a new custom column: enter image description here enter image description here (The error you see here is because there is no previous time.)

    Then I added a new column with the corrections for the errors from when I added the previous date, in each of the nested tables, in a new custom column. I figured I would "correct" caused by no previous times, by replacing the error with the "current" Booking Time, which would result in a duration of zero: enter image description here enter image description here

    Then I added a new column with the durations calculated in each of the nested tables, in a new custom column: enter image description here enter image description here

    Then I removed all columns except the last one I had added, which I had called AddDuration: enter image description here

    Then I expanded the AddDuration column: enter image description here

    Here's my M code:

    let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Split Column by Delimiter" = Table.SplitColumn(Table.TransformColumnTypes(Source, {{"Booking time", type text}}, "en-US"), "Booking time", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Booking time.1", "Booking time.2"}),
    #"Renamed Columns" = Table.RenameColumns(#"Split Column by Delimiter",{{"Booking time.1", "Booking Date"}, {"Booking time.2", "Booking Time"}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"Booking Date", type date}, {"Booking Time", type time}}),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Doorside] <> "-")),
    #"Sorted Rows" = Table.Sort(#"Filtered Rows",{{"Booking Date", Order.Ascending}, {"User", Order.Ascending}, {"Booking Time", Order.Ascending}}),
    #"Grouped Rows" = Table.Group(#"Sorted Rows", {"Booking Date", "User"}, {{"AllData", each _, type table}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "AddIndex", each Table.AddIndexColumn([AllData],"Index",0,1)),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "AddPreviousTime", each let tblName = [AddIndex] in Table.AddColumn([AddIndex],"Previous Time",each tblName{[Index]-1}[Booking Time], type time)),
    #"Added Custom2" = Table.AddColumn(#"Added Custom1", "CorrectErrors", each Table.ReplaceErrorValues([AddPreviousTime], {{"Previous Time", [AddPreviousTime][Booking Time]{0}}})),
    #"Added Custom3" = Table.AddColumn(#"Added Custom2", "AddDuration", each Table.AddColumn([CorrectErrors],"Duration", each [Booking Time] - [Previous Time], type duration)),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom3",{"AddDuration"}),
    #"Expanded AddDuration" = Table.ExpandTableColumn(#"Removed Other Columns", "AddDuration", {"Booking Date", "Booking Time", "User", "Doorside", "Index", "Previous Time", "Duration"}, {"Booking Date", "Booking Time", "User", "Doorside", "Index", "Previous Time", "Duration"})
    in
    #"Expanded AddDuration"