I'm trying to create a lookup table combining the my 3 source files primary keys columns, this way I won't have to do an outer join to find the missing records from each source and then append them together. I've found how to "combine" two source files but I can't figure out how to drill into the columns/fields lists so that I can select only Column 1 (or "Item Code" header name in the Excel files).
Here is the code I have so far to combine 2/3 files (as a trial):
let
Source = Table.Combine({Excel.Workbook(File.Contents("C:\Users\Desktop\Dry Good Demad-Supply Report\MRP_ParentDmd\Data_Sources\JDE_MRP_Dmd.xlsx"), null, true),
Excel.Workbook(File.Contents("C:\Users\Desktop\Dry Good Demad-Supply Report\MRP_ParentDmd\Data_Sources\JDE_Open_PO.xlsx"), null, true)})
in Source
If you've got a less than ideal data source (ie lots of irrelevant columns, duplicates in the data you want) then one way to avoid materialising a whole bunch of unnecessary data would be to perform all your transformations/filtering on nested table cells rather than loading all the data up just to remove columns/dupes.
The M code below should be a rough start that hopefully gets you on the way
let
//Adjust the Source step to refer to the relevant folder your 3 source files are saved in
Source = Folder.Files("CC:\Users\Desktop\Dry Good Demad-Supply Report\MRP_ParentDmd\Data_Sources"),
//Filter the file list to leave just your 3 source files if required
#"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".xlsx")),
//Remove all columns excep the Binary file column
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Content"}),
//Convert the binary file to the file data ie sheets, tables, named ranges etc - the same data you get when you use a file as a source
#"Workbook Data" = Table.TransformColumns(#"Removed Other Columns",{"Content", each Excel.Workbook(_)}),
//Filter the nested file data table cell to select the sheet you need from your source files - may not be necessary depending on what's in the files
#"Sheet Filter" = Table.TransformColumns(#"Workbook Data",{"Content", each Table.SelectRows(_, each [Name] = "Sheet1")}),
//Step to Name the column you want to extract data from
#"Column Name" = "Column1",
//Extract a List of the values in the specified column
#"Column Values" = Table.TransformColumns(#"Sheet Filter",{"Content", each List.Distinct(Table.Column(_{0}[Data],#"Column Name"))}),
//Expand all the lists
#"Expanded Content" = Table.ExpandListColumn(#"Column Values", "Content"),
#"Removed Duplicates" = Table.Distinct(#"Expanded Content")
in
#"Removed Duplicates"
EDIT
To select multiple columns and provide the distinct rows you could change the steps starting from the #"Column Name"
This could end up taking a fair bit longer than the previous step depending on how much data you have, but it should do the job
//Step to Name the column you want to extract data from
#"Column Name" = {"Column1","Column2","Column5"},
//Extract a List of the values in the specified column
#"Column Values" = Table.TransformColumns(#"Sheet Filter",{"Content", each Table.SelectColumns(_{0}[Data],#"Column Name")}),
//In each nested table, filter down to distinct rows
#"Distinct rows in Nested Tables" = Table.TransformColumns(#"Column Values",{"Content", each Table.Distinct(_)}),
//Expand nested table column
#"Expanded Content" = Table.ExpandTableColumn(#"Distinct rows in Nested Tables", "Content", #"Column Name"),
//Remove Duplicates in combined table
#"Removed Duplicates" = Table.Distinct(#"Expanded Content")
in
#"Removed Duplicates"