Search code examples
powerqueryaggregationcustom-function

Power Query M - Group by Column Value with Custom Aggregation (Percentile)


I am trying to calculate percentiles by group (from column values ex: hours by department, sales by region, etc.) within power query. This same logic could be used for other custom group aggregation. After lots of searching, I found 2 potential approaches.

Approach 1:

this archived article which looked to have the perfect answer. Nothing else I could find comes close.

The solution from there is the following custom function:

//PercentileInclusive Function
(inputSeries as list, percentile as number) =>
let
    SeriesCount = List.Count(inputSeries),
    PercentileRank = percentile * (SeriesCount - 1) + 1, //percentile value between 0 and 1
    PercentileRankRoundedUp = Number.RoundUp(PercentileRank),
    PercentileRankRoundedDown = Number.RoundDown(PercentileRank),
    Percentile1 = List.Max(List.MinN(inputSeries, PercentileRankRoundedDown)),
    Percentile2 = List.Max(List.MinN(inputSeries, PercentileRankRoundedUp)),
    PercentileInclusive = Percentile1 + (Percentile2 - Percentile1) * (PercentileRank - PercentileRankRoundedDown)
in
    PercentileInclusive 

Combined with a step in your table to group appropriately and use the function:

=Table.Group(TableName, {"Grouping Column"}, {{"New Column name", each
    PercentileInclusive(TableName[Column to calculate Percentile of], percentile # between 0 and 1)}})

[edited to correct the typo Ron R. pointed out and remove unnecessary detail]

Example input:

Pen Type Units Sold
Ball-Point 6,109
Ball-Point 3,085
Ball-Point 1,970
Ball-Point 8,190
Ball-Point 6,006
Ball-Point 2,671
Ball-Point 6,875
Roller 778
Roller 9,329
Roller 7,781
Roller 4,182
Roller 2,016
Roller 5,785
Roller 1,411

Desired output for a 25% inclusive percentile grouped by Pen Type:

Pen Type 0.25 Inclusive Percentile (Correct)
Ball-Point 2,878
Roller 1,714

Notes: No decimals shown above, calculated with Excel's PERCENTILE.INC function.

Approach 1 works great.

Approach 2: Here is an alternate Power Query solution I tried. It is a single step with no custom function. It seems like it should do the trick, but I can't figure out a way to make the conditional check be row based. Something needs to go where I have //Condition// that tells it which rows belong in the current rows group, but no matter what I try it does not work. It either breaks, or gives a percentile for everything, ignoring the grouping.

=List.Percentile(Table.Column(Table.SelectRows(#"Previous Step Name", //Condition//), "Column to calculate percentile of"), percentile # 0 to 1)

Any ideas how to make approach 2 work?


Solution

  • It appears your Table.Group function is incorrectly specified.

    Where my previous step was #"Changed Type", the following works:

        #"Grouped Rows" = Table.Group(#"Changed Type", {"Pen Type"}, {
            {"Percentile", each fnPercentileINC([Units Sold],0.25)}})
    

    Original Data
    enter image description here

    M Code

    let
        Source = Excel.CurrentWorkbook(){[Name="Table4"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"Pen Type", type text}, {"Units Sold", Int64.Type}}),
        #"Grouped Rows" = Table.Group(#"Changed Type", {"Pen Type"}, {
            {"Percentile", each fnPercentileINC([Units Sold],0.25), type number}})
    in
        #"Grouped Rows"
    

    Result
    enter image description here

    Edit: For your approach #2, without a custom function, you can merely use List.Percentile as an aggregation in the Table.Group function:

       #"Grouped Rows" = Table.Group(#"Changed Type", {"Pen Type"}, {
            {"25th Percentile", each List.Percentile([Units Sold],0.25)}       
            })