I got some help from @teylyn in a previous thread to get this code to work.
I have run into some more trouble regarding the operation time for the query.
The task:
The task is to combine two separate Contact lists into one completely new list.
I also need to remove duplicates who share the same name and company name.
Lastly i need to remove contacts that share the same CompanyID but where the Company name differs. The contacts from the Primary list should be kept in case a conflict occurs.
The code below works but takes far to long time to complete.
The list contains a total of approx 8500 contacts.
It takes 2.5 seconds to iterate through each contact which adds upp to approx 6 h until it's done.
My question:
Why does this operation take so long and is there any way to make it quicker?
let
Source = Table.Combine({PrimaryContacts, SecondaryContacts}),
#"Removed duplicates" = Table.Distinct(Source, {"CompanyID", "FirstName", "LastName"}),
#"Sorted rows" = Table.Sort(#"Removed duplicates",{{"CompanyID", Order.Ascending}, {"Email", Order.Descending}}),
#"Filtered rows" = Table.SelectRows(#"Sorted rows", each ["FirstName"] <> null and ["FirstName"] <> ""),
#"Added index" = Table.AddIndexColumn(#"Filtered rows", "Index", 10000, 1),
#"Renamed columns" = Table.RenameColumns(#"Added index",{{"Index", "ContactID"}}),
#"Reordered columns" = Table.ReorderColumns(#"Renamed columns",{"ContactID", "CompanyID", "CompanyName", "FirstName", "LastName}), // I have removed 10 columns for privacy reasons
#"Added index1" = Table.AddIndexColumn(#"Reordered columns", "Index", 0, 1),
#"Filtered rows1" = Table.SelectRows(#"Added index1", each (["ContactID"] = 10000 or ["ContactID"] = 10001 or ["ContactID"] = 10002 or ["ContactID"] = 10003 or ["ContactID"] = 10004 or ["ContactID"] = 10005)),
/* The filter above is temporary to be able to check if next step works */
/* It is the step below that takes too much time to finish */
#"Add custom" = Table.AddColumn(#"Filtered rows1", "Delete", each if ["CompanyID"]= #"Reordered columns"{[Index]-1}["CompanyID"] and ["CompanyName"]<> #"Reordered columns"{[Index]-1}["CompanyName"] then "Delete" else null)
in
#"Add custom"
Probably there is a problem with the first row where [Index] = 0 so [Index] - 1 = -1 which is out of range.
In general, it is not very efficient to refer to table rows using a row index. Instead it is a better to add 2 index columns: 1 starting with 0, the other starting with 1. Next you merge the table with itself, using the 0-based index as key for the first table and the 1-based index as key for the second table. As a result, you get a column with nested tables, containing the data from the previous row. If you name this column "Previous" and expand the required column, using the original name as prefix, you have fields from the current row combined with the fields from the previous row (prefixed with "Previous.") next to each other and proceed from there.