Search code examples
powerbidaxpowerquerym

Clean up table in Power BI


I am attempting to load several Excel files into Power BI. These files are pretty small (<= ~1k rows). One of these sources must be cleaned up. In particular, one of its columns has some bad data. The correct data is stored in another Excel file. For example:

table bad:
ID    col1
1     0
2     0.5
3     2
4     -3

table correct:
ID    colx
2     1
4     5

desired output:
ID    col1
1     0
2     1
3     2
4     5

In SQL or other data visualization tools, I would left join the bad table to the clean table and then coalesce the bad values and correct values. I know that I have some options on how to implement this in Power BI. I think one option is to implement it in the query editor (i.e., M). I think another option is to implement it in the data model (i.e., DAX). Which option is best? And, what would the implementation look like (e.g., if M, then what does the query look like)?


Solution

  • While you can do this in DAX, I'd suggest doing it in the query editor. The steps would look roughly like this:

    1. Merge the Correct table into the Bad table using a left outer join on the ID columns.

    Merge Tables

    1. Expand out the Correct table to just get the Colx column.

    Expand Column

    1. Create a custom column to pick the values you want. (Add Column > Custom Column)

      if [Colx] = null then [Col1] else [Colx]

    Custom Column

    1. You can remove the Col1 and Colx column if you want or just keep them. If you delete Col1, you can rename the Col2 column to be Col1.

    If you don't want the source tables floating around, you can do all of the above in a single query similar to this:

    let
        BadSource = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTJQitWJVjICsfRMwWxjINsIzDIBsnSNlWJjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [ID = _t, Col1 = _t]),
        CorrectSource = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlLSUTJUitWJVjIBskyVYmMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [ID = _t, Colx = _t]),
        Bad = Table.TransformColumnTypes(BadSource,{{"ID", Int64.Type}, {"Col1", type number}}),
        Correct = Table.TransformColumnTypes(CorrectSource,{{"ID", Int64.Type}, {"Colx", type number}}),
        #"Merged Queries" = Table.NestedJoin(Bad,{"ID"},Correct,{"ID"},"Correct",JoinKind.LeftOuter),
        #"Expanded Correct" = Table.ExpandTableColumn(#"Merged Queries", "Correct", {"Colx"}, {"Colx"}),
        #"Added Custom" = Table.AddColumn(#"Expanded Correct", "Col2", each if [Colx] = null then [Col1] else [Colx]),
        #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Col1", "Colx"}),
        #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Col2", "Col1"}})
    in
        #"Renamed Columns"