Search code examples
powerbipowerquerypowerbi-desktop

Dynamically transform column values by subtracting previous column


PROBLEM

In PowerQuery I need to go from this table (input) enter image description here

to this table (output)
enter image description here

where each YYYYMM column (except the first, in this case 202401) is transformed by subtracting the value in the column immediately to the left.
It's guaranteed there will be no skipped months in the sequence.
The number of month columns is unknown at each refresh, so I need to do this dynamically.


SO FAR
After researching I found a promising approach in this SO answer.

This works:

output = Table.TransformRows(
    input,
    (r) =>  Record.TransformFields(
        r,
        {{"202405", each _ - r[202404]},
        {"202404", each _ - r[202403]},
        {"202403", each _ - r[202402]},
        {"202402", each _ - r[202401]},
        {"202401", each _ - r[202312]}}
    )
)

but as you can see, this approach requires hardcoded values so it needs to be generalized.

Record.TransformFields accepts a list of TransformOperations, so I tried to generate the list of transformations dynamically, but ultimately failed and finally succeeded (see UPDATE below).

periods = List.Sort(List.Skip(Table.ColumnNames(input), 1), Order.Ascending)

output = Table.TransformRows(
    input,
    (r) =>  Record.TransformFields(
        r,
        List.Transform(
            List.Skip(periods, 1),
            each (p) => {p, each (e) => e - Record.Field(r, Text.From(Number.From(p) - 1))}
        )
    )
)

The code above doesn't work for two reasons:

a. List.Transform is not returning valid TransformOperations since each row transformation errors out with

Expression.Error: Expected a TransformOperations value.
Details:
    [List]

b. It wouldn't handle Januaries since the previous column would have a different year and the month wraps (eg.: when I need to subtract 202312 from 202401). I guess this could be handled with an if statement placed in the List.Transform transformation (subtract 89 if last digit is 1).

I also researched Table.TransformColumns but I believe the transformOperations can't access values outside of the column being transformed.

I am not even sure this is the right approach and I couldn't find anything else, so I'd appreciate any help with this.


UPDATE - WORKING BUT EXTREMELY SLOW SOLUTION
I managed to make the above approach work. I was incorrectly using each with the explicit function declaration (a) => something(a). I also plugged in the logic for when the year wraps.

periods = List.Sort(List.Skip(Table.ColumnNames(input), 1), Order.Ascending)

output = Table.TransformRows(
    input,
    (r) =>  Record.TransformFields(
        r,
        List.Transform(
            List.Skip(periods, 1),
            (p) => if Text.EndsWith(p, "01")
            then {p, (e) => e - Record.Field(r, Text.From(Number.From(p) - 89))}
            else {p, (e) => e - Record.Field(r, Text.From(Number.From(p) - 1))}
        )
    )
)  

The reason I am not using this answer for my own question is that this works fast on the test table I provided, but it is incredibly slow on my main table with hundreds, and possibly thousands of rows.

Not sure at this point if I should even attempt this, but I am pretty sure this can be done with reasonable performance. If I find anything more compelling I'll update/answer the question.


Solution

  • I'd be interested if this is any faster. Another approach would be to

    • Unpivot all except the ID column
    • Group by ID
    • Within each group
      • Add a Shifted Value column (to avoid using an Index column)
      • Do the subtractions
      • Pivot the results
    • Re-expand the groups

    Data
    enter image description here

    M Code

    let
        Source = Excel.CurrentWorkbook(){[Name="Table8"]}[Content],
        #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"ID"}, "Attribute", "Value"),
        #"Grouped Rows" = Table.Group(#"Unpivoted Other Columns", {"ID"}, {
            {"Subtract", (t)=>let 
                    #"Add Shifted" = 
                        Table.FromColumns(
                            Table.ToColumns(t) &
                            {{null} & List.RemoveLastN(t[Value])},
                            {"ID", "Attribute","Value","Shifted Value"}),
                    #"New Value" = Table.AddColumn(#"Add Shifted","New", each ([Value] - [Shifted Value])??[Value], type number),
                    #"Remove Columns" = Table.RemoveColumns(#"New Value",{"Value","Shifted Value"}),
                    #"Rename" = Table.RenameColumns(#"Remove Columns",{{"New","Value"}}),
                    #"Pivot" = Table.Pivot(#"Rename", #"Rename"[Attribute], "Attribute","Value")    
                in 
                    #"Pivot"
                }}),
        #"Expanded Subtract" = Table.ExpandTableColumn(#"Grouped Rows", "Subtract", List.RemoveFirstN(Table.ColumnNames(Source),1)),
        #"Type Data" = Table.TransformColumnTypes(#"Expanded Subtract", 
            List.Transform(List.RemoveFirstN(Table.ColumnNames(Source),1), each {_, Int64.Type}))
    in
        #"Type Data"
    

    Results
    enter image description here

    Edit
    For a technique using the Table.ReplaceValue function, along with List.Accumulate, I suggest the following:

    let
        Source = Excel.CurrentWorkbook(){[Name="Table8"]}[Content],
        #"Change Type" = Table.TransformColumnTypes(Source,
            {{"ID", type text}} & List.Transform(List.RemoveFirstN(Table.ColumnNames(Source),1), each {_, Int64.Type})),
    
    //Column pairs to process
        cp= List.Reverse(List.RemoveLastN(
                List.Zip({List.RemoveFirstN(Table.ColumnNames(Source),1), List.RemoveFirstN(Table.ColumnNames(Source),2)})
                ,1)),
        
        #"Replace Values" = List.Accumulate(
            cp,
            #"Change Type",
            (s,c)=> Table.ReplaceValue(
                s,
                each Record.Field(_,c{1}),
                each Record.Field(_,c{0}),
                (x,y,z) as number => y-z,
                {c{1}}
            )
        )
    
    in
        #"Replace Values"