Search code examples
excelduplicatespowerquery

Removing duplicate values in one column based on group of values in another column in power query


I have a query in Power Query Editor that has everything i need. However, due to splitting columns and other steps within the process, i have duplicate values in a column that hyper inflates my numbers. Is there an applied step that i can add to the end of my query that will remove duplicate values in "Column 4" based on the value in "Column 5". Column 5 is a unique ID and i only need the first value under Column 4 based on the ID.

The example below is just a small example as my current output has 51 columns and quite a bit of applied steps in the current query.

enter image description here

duplicates overall in the column are okay, but cannot have duplicates based on the value in another column and the basic remove duplicates removes all.


Solution

  • A relatively simple way. Keeps the first non-blank for Column4

    #"PriorStepName" = .............,
    #"Grouped Rows" = Table.Group(#"PriorStepName", {"Column5"}, {{"data", each 
        let a = _,
        b = Table.RemoveColumns(a,{"Column4"}),
        c = {{List.RemoveNulls(a[Column4]){0}}},
        d = Table.FromColumns(Table.ToColumns(b) & c,Table.ColumnNames(b)&{"Column4"})
        in d , type table }}),
    #"Removed Columns" = Table.RemoveColumns(#"Grouped Rows",{"Column5"}),
    #"Expanded data" = Table.ExpandTableColumn(#"Removed Columns", "data", Table.ColumnNames(#"PriorStepName"),Table.ColumnNames(#"PriorStepName"))
    in  #"Expanded data"  
    

    enter image description here