Search code examples
powerbipowerquerym

Power Query combine three external Excel source files and append specific columns


I'm trying to create a lookup table combining the my 3 source files primary keys columns, this way I won't have to do an outer join to find the missing records from each source and then append them together. I've found how to "combine" two source files but I can't figure out how to drill into the columns/fields lists so that I can select only Column 1 (or "Item Code" header name in the Excel files).

Here is the code I have so far to combine 2/3 files (as a trial):

let
Source = Table.Combine({Excel.Workbook(File.Contents("C:\Users\Desktop\Dry Good Demad-Supply Report\MRP_ParentDmd\Data_Sources\JDE_MRP_Dmd.xlsx"), null, true), 

Excel.Workbook(File.Contents("C:\Users\Desktop\Dry Good Demad-Supply Report\MRP_ParentDmd\Data_Sources\JDE_Open_PO.xlsx"), null, true)})

in Source

Solution

  • If you've got a less than ideal data source (ie lots of irrelevant columns, duplicates in the data you want) then one way to avoid materialising a whole bunch of unnecessary data would be to perform all your transformations/filtering on nested table cells rather than loading all the data up just to remove columns/dupes.

    The M code below should be a rough start that hopefully gets you on the way

    let
        //Adjust the Source step to refer to the relevant folder your 3 source files are saved in
        Source = Folder.Files("CC:\Users\Desktop\Dry Good Demad-Supply Report\MRP_ParentDmd\Data_Sources"),
    
        //Filter the file list to leave just your 3 source files if required
        #"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".xlsx")),
    
        //Remove all columns excep the Binary file column
        #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Content"}),
    
        //Convert the binary file to the file data ie sheets, tables, named ranges etc - the same data you get when you use a file as a source
        #"Workbook Data" = Table.TransformColumns(#"Removed Other Columns",{"Content", each Excel.Workbook(_)}), 
    
        //Filter the nested file data table cell to select the sheet you need from your source files - may not be necessary depending on what's in the files
        #"Sheet Filter" = Table.TransformColumns(#"Workbook Data",{"Content", each Table.SelectRows(_, each [Name] = "Sheet1")}),     
    
        //Step to Name the column you want to extract data from
        #"Column Name" = "Column1",
    
        //Extract a List of the values in the specified column
        #"Column Values" = Table.TransformColumns(#"Sheet Filter",{"Content", each List.Distinct(Table.Column(_{0}[Data],#"Column Name"))}), 
    
        //Expand all the lists
        #"Expanded Content" = Table.ExpandListColumn(#"Column Values", "Content"),
    
        #"Removed Duplicates" = Table.Distinct(#"Expanded Content")
    in
        #"Removed Duplicates"
    

    EDIT To select multiple columns and provide the distinct rows you could change the steps starting from the #"Column Name"

    This could end up taking a fair bit longer than the previous step depending on how much data you have, but it should do the job

        //Step to Name the column you want to extract data from
        #"Column Name" = {"Column1","Column2","Column5"},
    
        //Extract a List of the values in the specified column
        #"Column Values" = Table.TransformColumns(#"Sheet Filter",{"Content", each Table.SelectColumns(_{0}[Data],#"Column Name")}),
    
        //In each nested table, filter down to distinct rows
        #"Distinct rows in Nested Tables" = Table.TransformColumns(#"Column Values",{"Content", each Table.Distinct(_)}),
    
        //Expand nested table column
        #"Expanded Content" = Table.ExpandTableColumn(#"Distinct rows in Nested Tables", "Content", #"Column Name"), 
    
        //Remove Duplicates in combined table
        #"Removed Duplicates" = Table.Distinct(#"Expanded Content")
    in
        #"Removed Duplicates"