Search code examples
excelpowerbipowerqueryexcel-2016

Remove duplicates with exception


I have PDF I'm converting to a excel table and removing duplicates that the process makes, I was able to tune it down enough to extract some useful information, but now I hit a roadblock.

let
Fonte = Pdf.Tables(File.Contents("S:\Aplicativo\API Relatório\RelSC18102023_085032456.pdf"), [Implementation="1.3"]),
#"Data Expandido" = Table.ExpandTableColumn(Fonte, "Data", {"Column1", "Column2", "Column3","Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13"},{"Data.Column1", "Data.Column2", "Data.Column3", "Data.Column4", "Data.Column5", "Data.Column6", "Data.Column7", "Data.Column8","Data.Column9", "Data.Column10", "Data.Column11", "Data.Column12", "Data.Column13"}),
#"Colunas Removidas" = Table.RemoveColumns(#"Data Expandido",{"Id", "Name", "Kind"}),
#"Remover Nulls" = Table.FromColumns(List.Zip(List.RemoveNulls(List.Transform(Table.ToRows(#"Colunas Removidas"), each if List.NonNullCount(_)=0 then null else List.RemoveNulls(_))))),
#"Duplicatas Removidas" = Table.Distinct(#"Remover Nulls"),
#"Duplicatas Removidas1" = Table.Distinct(#"Duplicatas Removidas", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7"})
in
#"Duplicatas Removidas1"

When I do the last Table.Distinct, it removes all duplicates from the table... including a line that show the client id when it repeats (the table goes something like, order id, client id, then other order info). What I wanted was to put a conditional before the Table.Distinct to the tune of "if column1 = "Client:" do nothing, else Table.Distinct (etc) in blablabla".

Would that be possible? If so, what is the correct sintax to achieve it? It would also help me with another problem with the table (where sometimes a order info is equal to another order and is removed as a duplicate).


Solution

  • Filter Column<>"Client:", do Table.Distinct, add back Column1="Client:"

    #"Filtered Rows" = Table.Distinct(Table.SelectRows(#"PriorStepNameHere", each [Column1] <> "Client:"))& Table.SelectRows(#"PriorStepNameHere", each [Column1] = "Client:")
    

    To preserve row sorting, add column ... index column ... filter/unduplicate, then resort afterwards

    let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
    #"Filtered Rows" = Table.Distinct(Table.SelectRows(#"Added Index", each [Column1] <> "Client:"),List.Difference(Table.ColumnNames(#"Added Index"),{"Index"}))& Table.SelectRows(#"Added Index", each [Column1] = "Client:"),
    #"Sorted Rows" = Table.Sort(#"Filtered Rows",{{"Index", Order.Ascending}}),
    #"Removed Columns" = Table.RemoveColumns(#"Sorted Rows",{"Index"})
    in #"Removed Columns"