I am using Power Query in Excel and I need to calculate the duration on each "Door_side" using the Time column on a daily level for each individual user.
The data comes from a card based access system and is formatted as follows:
Date Time User_No Door_side
03/12 08:59 User_05 Outside
03/12 09:00 User_33 Inside
03/12 09:01 User_10 Outside
03/12 09:01 User_04 Outside
03/12 09:02 User_26 Outside
03/12 09:03 User_19 Outside
03/12 09:03 User_15 Inside
03/12 09:04 User_31 Inside
03/12 09:05 User_31 Outside
03/12 09:06 User_15 Outside
03/12 09:06 User_06 Inside
03/12 09:06 User_06 Inside
03/12 09:06 User_06 Inside
03/12 09:08 User_32 Outside
03/12 09:09 User_10 Inside
03/12 09:09 User_13 Inside
03/12 09:10 User_10 Outside
I tried the following:
The full code for the above mentioned steps is:
let
Source = Table,
#"Sorted Rows" = Table.Sort(Source,{{"Date", Order.Ascending}, {"User_No", Order.Ascending}, {"Time", Order.Ascending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "PreviousTime", each try
if List.AllTrue(
{[User_No]=#"Added Index"[User_No]{[Index]-1},[Date]=#"Added Index"[Date]{[Index]-1}
}
)
then try #"Added Index"[Time]{[Index]-1} otherwise [Time]
else [Time]
otherwise [Time]),
Duration = Table.AddColumn(#"Added Custom", "Duration", each [Time] - [PreviousTime], type duration)
in
Duration
This works on small data sets but causes functionality issues and completely fails on a larger amount of data. I am fairly new to Power Query and M so I just can't figure out what exactly from the custom column formula causes issues or how to approach this in another way.
I tried to keep the above code as part of my query and also to use it as a function but there is not much difference functionality wise between these two approaches. The processed table will be sent to the Data Model but I was hoping to obtain the duration in Power Query rather than in Power Pivot. A big thank you in advance!
To detail on the task a bit more I uploaded a reduced version of the data, for 3 users for the month of December. You can find it here: https://1drv.ms/x/s!AocQlL_KAzymgwhqiKxSL5JMZheL.
What I want to achieve is to calculate the duration between the timestamps based on user and date. As a plus I do not have users working past midnight so all timestamps for a specific shift will be within the same date.
An example of the desired outcome can be found within the workbook as well and looks like this (calculated in Excel):
Date Time User Door_side Duration
03/12 06:54 User_1 Outside
03/12 07:26 User_1 Inside 00:32:00
03/12 07:27 User_1 Outside 00:01:00
03/12 07:44 User_1 Inside 00:17:00
03/12 07:52 User_1 Outside 00:08:00
03/12 08:35 User_1 Inside 00:43:00
03/12 08:36 User_1 Outside 00:01:00
03/12 11:50 User_1 Inside 03:14:00
03/12 12:01 User_1 Outside 00:11:00
03/12 13:27 User_1 Inside 01:26:00
03/12 13:43 User_1 Outside 00:16:00
03/12 14:57 User_1 Inside 01:14:00
03/12 15:20 User_1 Inside 00:23:00
03/12 15:26 User_1 Outside 00:06:00
03/12 15:34 User_1 Inside 00:08:00
Because the data contains all users and multiple days I am attempting to do the calculations within tables grouped by Date and User.
I spent some time testing all 3 approaches presented below (List.Min, Table.FirstN & nested tables) and on a limited data set all of them do a great job.
However, when applied to a larger dataset (I have around 20000 rows for 1 month) the nested tables approach seems to be the fastest.
Thank you Eugene and Marc for helping and, more important, for teaching me something new.
Here's a different approach. It relies on working in nested tables.
I started with your data from your spreadsheet, in a table named Table1:
In Power Query, using Table1 as the source, I split the Booking Time column, renamed the resulting date and time columns, filtered, out the - Doorside entries, and sorted per your guidance:
Then I grouped by Booking Date and User:
Then I added an index column within each of the nested tables, in a new custom column:
Then I added a new column with the previous time within each of the nested tables, in a new custom column: (The error you see here is because there is no previous time.)
Then I added a new column with the corrections for the errors from when I added the previous date, in each of the nested tables, in a new custom column. I figured I would "correct" caused by no previous times, by replacing the error with the "current" Booking Time, which would result in a duration of zero:
Then I added a new column with the durations calculated in each of the nested tables, in a new custom column:
Then I removed all columns except the last one I had added, which I had called AddDuration:
Then I expanded the AddDuration column:
Here's my M code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Split Column by Delimiter" = Table.SplitColumn(Table.TransformColumnTypes(Source, {{"Booking time", type text}}, "en-US"), "Booking time", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Booking time.1", "Booking time.2"}),
#"Renamed Columns" = Table.RenameColumns(#"Split Column by Delimiter",{{"Booking time.1", "Booking Date"}, {"Booking time.2", "Booking Time"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"Booking Date", type date}, {"Booking Time", type time}}),
#"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Doorside] <> "-")),
#"Sorted Rows" = Table.Sort(#"Filtered Rows",{{"Booking Date", Order.Ascending}, {"User", Order.Ascending}, {"Booking Time", Order.Ascending}}),
#"Grouped Rows" = Table.Group(#"Sorted Rows", {"Booking Date", "User"}, {{"AllData", each _, type table}}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "AddIndex", each Table.AddIndexColumn([AllData],"Index",0,1)),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "AddPreviousTime", each let tblName = [AddIndex] in Table.AddColumn([AddIndex],"Previous Time",each tblName{[Index]-1}[Booking Time], type time)),
#"Added Custom2" = Table.AddColumn(#"Added Custom1", "CorrectErrors", each Table.ReplaceErrorValues([AddPreviousTime], {{"Previous Time", [AddPreviousTime][Booking Time]{0}}})),
#"Added Custom3" = Table.AddColumn(#"Added Custom2", "AddDuration", each Table.AddColumn([CorrectErrors],"Duration", each [Booking Time] - [Previous Time], type duration)),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom3",{"AddDuration"}),
#"Expanded AddDuration" = Table.ExpandTableColumn(#"Removed Other Columns", "AddDuration", {"Booking Date", "Booking Time", "User", "Doorside", "Index", "Previous Time", "Duration"}, {"Booking Date", "Booking Time", "User", "Doorside", "Index", "Previous Time", "Duration"})
in
#"Expanded AddDuration"