Search code examples
excelpowerquerycountif

Power Query "COUNTIFS" takes long time to load for large data


I have a data as below. The real data is much larger and with more columns. This is records of hotel bookings.

enter image description here

and I want it to be summarized to show each day how many rooms of each type are booked with the output as below:

enter image description here

With such the small data the below code in Power Query works well. However, with the real data which has approximately 16k rows, it takes more than half and hour to load the data.

let
    Source = List.Dates(Date.From(List.Min(Bookings[Arrival Date])),Duration.Days(DateTime.Date(DateTime.LocalNow()) - Date.From(List.Min(Bookings[Arrival Date]))) + 1,#duration(1,0,0,0)),
    #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "Date"}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"Date", type date}}),
    #"Added Standard" = Table.AddColumn(#"Changed Type", "Standard", each List.Count(
    Table.SelectRows(
        Bookings,
        (Standard) => Standard[Arrival Date] <= [Date]
        and
        Standard[Departure Date]>[Date]
        and
        Standard[Room Type]="Standard"
    )[Booking Number]
)),
    #"Added Deluxe" = Table.AddColumn(#"Added Standard", "Deluxe", each List.Count(
    Table.SelectRows(
        Bookings,
        (Deluxe) => Deluxe[Arrival Date] <= [Date]
        and
        Deluxe[Departure Date]>[Date]
        and
        Deluxe[Room Type]="Deluxe"
    )[Booking Number]
)),
    #"Added Suite" = Table.AddColumn(#"Added Deluxe", "Suite", each List.Count(
    Table.SelectRows(
        Bookings,
        (Suite) => Suite[Arrival Date] <= [Date]
        and
        Suite[Departure Date]>[Date]
        and
        Suite[Room Type]="Suite"
    )[Booking Number]
)),
    #"Changed to Number" = Table.TransformColumnTypes(#"Added Suite",{{"Standard", Int64.Type}, {"Deluxe", Int64.Type}, {"Suite", Int64.Type}})
in
    #"Changed to Number"

Is there any other solutions which work for large data.


Solution

  • I am assuming that when you write "loading the data" you really mean running the entire query, and not just the loading of data into the query.

    Table.SelectRows is relatively slow.

    This should work more quickly. If it does on your actual data, then it is easy to make whatever minor adjustments are needed.

    • Add a column with the complete list of dates for each booking
      • including the arrival date but not the departure date
    • Expand that date column so we have a single entry for each date/room_type
    • Group by date and room_type and aggregate with the Count function
    • Pivot the room_type function with a SUM aggregation on the Count column
    • Re-order the columns and replace the nulls with zeros (0)

    Note that this algorithm counts the arrival but not the departure dates. That can be easily changed if that is not the case.

    let
        Source = Excel.CurrentWorkbook(){[Name="Bookings"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{
            {"Booking Number", Int64.Type}, {"Arrival Date", type date}, {"Departure Date", type date}, {"Room Type", type text}}),
        
    //add column with all of the dates for each booking
    //This will include the arrival but not the departure date
    //easy to adjust if needed
        #"All Dates" = Table.AddColumn(#"Changed Type", "Dates", each List.Dates([Arrival Date], 
                             Duration.Days([Departure Date]-[Arrival Date]), 
                             #duration(1,0,0,0)), type {date}),
    
    //expand to one entry per date
        #"Expanded allDates" = Table.ExpandListColumn(#"All Dates", "Dates"),
    
    //Adjust Room Types for minor typos
        #"Adjust Room Types" = Table.TransformColumns(#"Expanded allDates", {"Room Type", each Text.Proper(Text.Trim(_))}),
    
    //Group by dates and room type
        #"Grouped Rows" = Table.Group(#"Adjust Room Types", {"Dates", "Room Type"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
    
    //then pivot
        #"Pivoted Column" = Table.Pivot(#"Grouped Rows", List.Distinct(#"Grouped Rows"[#"Room Type"]), 
            "Room Type", "Count", List.Sum),
        #"Reordered Columns" = Table.ReorderColumns(#"Pivoted Column",{"Dates", "Standard", "Deluxe", "Suite"}),
        #"Replace nulls with zero" = Table.ReplaceValue(#"Reordered Columns",null,0,Replacer.ReplaceValue,{"Standard", "Deluxe", "Suite"})
    in
        #"Replace nulls with zero"
    

    Data
    enter image description here

    Results
    enter image description here