Search code examples
powerbiduplicatespowerquerym

Power BI: List duplicates from a table for a combination of columns


I have below the table sourced from Flat file:

DATE STATE CODE GROUP TYPE ITEM COST
01/02/2023 TX C1 G1 T1 I1 1.234
01/02/2023 TX C1 G1 T1 I2 1.234
01/02/2023 TX C1 G1 T1 I3 1.234
01/02/2023 TX C1 G1 T1 I4 1.234
01/03/2023 PL C2 G2 T3 I5 1.234
01/03/2023 PL C2 G2 T3 I6 2.234
01/03/2023 PL C2 G2 T3 I7 3.234
01/03/2023 PL C2 G2 T3 I8 3.234

In a table visual, the requirement is:

  • For the combination of Date + State + Code + Group + Type, show all rows (if at least one row has a different COST).
  • The first 4 rows should not be displayed since the COST is the same for the above combination.
  • The last 4 rows must be displayed since the COSTS are different (note that the last row should also be displayed)

Note: I have tried groups by, inner join based on a composite key, removing duplicates and all but none yield the results that I am looking for.

Help is greatly appreciated


Solution

  • enter image description here

    In PowerQuery, do the following.

    1. Group by as follows: enter image description here

    2. Highlight first 5 columns and keep duplicates

    3. Expand the All column to get ITEM back.

    4. Remove the count column as not required

    let
        Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjDUNzDSNzIwMlZQ0lEKiQCRzoYg0h1MhoBJTzBpqGdkbKIUq0O0LiOydBmTpcsEU5cxXFeAD1gX2D3uYDIELO5pSpYuMxBpRKoucxBpTKouCyRdsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [DATE = _t, STATE = _t, CODE = _t, GROUP = _t, TYPE = _t, ITEM = _t, COST = _t]),
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"DATE", type date}, {"STATE", type text}, {"CODE", type text}, {"GROUP", type text}, {"TYPE", type text}, {"ITEM", type text}, {"COST", type number}}),
        #"Grouped Rows" = Table.Group(#"Changed Type", {"DATE", "STATE", "CODE", "GROUP", "TYPE", "COST"}, {{"Count", each Table.RowCount(_), Int64.Type}, {"All", each _, type table [DATE=nullable date, STATE=nullable text, CODE=nullable text, GROUP=nullable text, TYPE=nullable text, ITEM=nullable text, COST=nullable number]}}),
        #"Kept Duplicates" = let columnNames = {"DATE", "STATE", "CODE", "GROUP", "TYPE"}, addCount = Table.Group(#"Grouped Rows", columnNames, {{"Count.1", Table.RowCount, type number}}), selectDuplicates = Table.SelectRows(addCount, each [Count.1] > 1), removeCount = Table.RemoveColumns(selectDuplicates, "Count.1") in Table.Join(#"Grouped Rows", columnNames, removeCount, columnNames, JoinKind.Inner),
        #"Expanded All" = Table.ExpandTableColumn(#"Kept Duplicates", "All", {"ITEM"}, {"ITEM"}),
        #"Removed Columns" = Table.RemoveColumns(#"Expanded All",{"Count"})
    in
        #"Removed Columns"