I have 2 tables, #"Lead Data1" and #"Enrolment Data", with ~300k and ~16k rows respectively with key (Lead Number). I am using a separate query to pull #"Enrolment Data". I want to update the #"Lead Data1" columns {"Created On", "Course", "Program", "Business"} from the #"Enrolment Data", retaining the other columns. If any key (Lead Number) in #"Enrolment Data" is not found in #"Lead Data1", those records need to be appended to it.
I tried the below code:
let
Source = Excel.Workbook(File.Contents("xxx.xlsb"), null, true),
#"Lead Data1" = Source{[Name="Lead Data"]}[Data],
#"Promoted Headers" = Table.PromoteHeaders(#"Lead Data1", [PromoteAllScalars=true]),
#"Removed Other Columns" = Table.SelectColumns(#"Promoted Headers",{"Created On", "Owner", "Lead Number", "Source", "Source type", "TL", "Course", "Program", "Business"}),
Here I remove all unnecessary columns, these are the ones I need in my output from combining #"Lead Data1" and #"Enrolment Data". I will use #"Removed Other Columns" table again to append non-matched items.
#"Removed Other Columns1" = Table.SelectColumns(#"Removed Other Columns",{"Owner", "Lead Number", "Source", "Source type", "TL"}),
I only keep columns I need to merge to #"Enrolment Data" (which don't have these)
#"Merged Queries" = Table.Join(#"Removed Other Columns1", {"Lead Number"}, #"Enrolment Data", {"Lead Number"}, JoinKind.RightOuter, 2),
I am merging with #"Enrolment Data" and not the other way around because it has much lower number of rows (if that makes a difference). This would give me the "updated" rows, including those with keys not matching #"Lead Data1".
#"Rows to Remove" = Table.Column(#"Merged Queries","Lead Number"),
Once merged, I select the list of matched keys (updated rows) in #"Enrolment Data".
#"Selected Rows" = Table.RemoveMatchingRows(#"Removed Other Columns", each List.Contains(#"Rows to Remove",[#"Lead Number"])),
I remove those rows from #"Lead Data1" (giving "non-updated" rows)
#"Appended Queries" = Table.Combine({#"Merged Queries", #"Selected Rows"})
I append the "Updated" and "Non-Updated" rows.
in #"Appended Queries"
However, the query throws an error of "Data Source not found" - I think this might be because I am referencing #"Removed Other Columns" table wrong. Is there any way around it, or my approach itself is faulty? If so, any solution will be highly appreciated.
I don't know where your error is coming from. With your data, I could not get the code you posted to produce anything meaningful. So this may not help your problem, but it will work if you can read your two tables into PQ.
This is a method of merging your two tables. In this case, the assumption is that you have read the two tables into individual queries, and are then merging those two queries.
let
//Join the two tables
Source = Table.NestedJoin(Lead_Data, {"Lead Number"}, Enrolment_Data, {"Lead Number"}, "Enrolment_Data", JoinKind.FullOuter),
//Replace Lead number if Enrolment_Data has a new entry
#"Update Lead Number" = Table.ReplaceValue(
Source,
each [Lead Number],
each [Enrolment_Data][Lead Number]{0},
(x,y,z)=>y??z,
{"Lead Number"}),
//Update other columns with entries from Enrolment_data if any are present
// Note the custom replacer function will maintain the Text type of the columns
#"Update Course/Program/Business" = List.Accumulate(
{"Course","Program","Business"},
#"Update Lead Number",
(s,c)=>Table.ReplaceValue(
s,
each Record.Field(_,c),
each Table.Column([Enrolment_Data],c){0},
(x,y,z) as text => if z <> null then z else y,
{c})
),
//Remove Enrolment_Data table column
#"Removed Columns" = Table.RemoveColumns(#"Update Course/Program/Business",{"Enrolment_Data"}),
//Sort by lead number if needed
#"Sorted Rows" = Table.Sort(#"Removed Columns",{{"Lead Number", Order.Ascending}})
in
#"Sorted Rows"
Alternate Method
let
//Join the two tables two ways
#"Right Outer Join" = Table.NestedJoin(Lead_Data, {"Lead Number"}, Enrolment_Data, {"Lead Number"}, "Enrols", JoinKind.RightOuter),
#"Left-Anti Join" = Table.NestedJoin(Lead_Data,"Lead Number", Enrolment_Data,"Lead Number", "Leads", JoinKind.LeftAnti),
#"Unmodified Leads" = Table.RemoveColumns(#"Left-Anti Join","Leads"),
//Remove columns in Right Outer that are present in the table, then expand
#"Enrol colNames" = Table.ColumnNames(#"Right Outer Join"[Enrols]{0}),
#"Remove Columns" = Table.RemoveColumns(#"Right Outer Join",#"Enrol colNames"),
#"Modified Leads" = Table.ExpandTableColumn(#"Remove Columns","Enrols",#"Enrol colNames"),
//Append the tables
#"All Leads" = Table.Combine({#"Unmodified Leads",#"Modified Leads"}),
//Sort by Lead Number
Sort = Table.Sort(#"All Leads", {"Lead Number", Order.Ascending})
in
Sort