Search code examples
powerbidaxpowerquerypowerbi-desktopm

Extract the start and end date of a sick leave in power query (gaps and islands)


I would like to have a max and min date of the different types of absence for every employee for project. I would like to have it in Power BI but I don't know how to do it.

There are 3 columns that this should be grouped: Project ID, Person ID and Time Status.

I thought about grouping also on Month and Year but then if there will be situation that there is 2 sick leaves in the same month it will not work properly. Here is an example of data:

enter image description here

And the outcome should be like that: enter image description here

Thanks.

Here is the sample of the data:

Project ID  Person ID   Time Status     Date
10          1           Sick leave      21.06.2024
10          1           Sick leave      09.07.2024
10          1           Sick leave      10.07.2024
10          1           Sick leave      11.07.2024
10          1           Sick leave      12.07.2024
9           2           Annual leave    12.08.2024
9           2           Annual leave    13.08.2024
9           2           Annual leave    14.08.2024
9           2           Annual leave    15.08.2024
9           2           Annual leave    31.08.2024

Solution

  • If you have a large set of data, the following will execute significantly faster.

    It assumes the data is already sorted as you show in your example -- by dates ascending within each Project ID | Person ID. If they are not, then this will need to be sorted which will lengthen the processing time.

    It uses an Index column and the 4th and 5th arguments of the Table.Group function to ensure the grouping is by consecutive dates:

    let
        Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{
            {"Project ID", Int64.Type}, {"Person ID", Int64.Type}, {"Time Status", type text}, {"Date", type date}}),
        #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
    
    
        #"Grouped Rows" = Table.Group(#"Added Index", {"Project ID", "Person ID", "Time Status","Date", "Index"}, {
            {"Start Date", each List.Min([Date]), type date},
            {"End Date", each List.Max([Date]), type date}
            }, GroupKind.Local,(x,y)=>Duration.Days(y[Date]-x[Date])- (y[Index]-x[Index]) ),
            
        #"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"Index"})
    in
        #"Removed Columns"