Search code examples
excelpdfpowerquery

Is there a way to parameterize an Excel Power Query so that the same transforms are applied to multiple tables?


I have a PDF file with 250+ pages in it. Each page has a table which is the same format on all pages. The data is different but the structure is the same.

let
    Source = Pdf.Tables(File.Contents("C:\path\to\file.pdf"), [Implementation="1.3"]),
    Table00X = Source{[Id="Table007"]}[Data],
    #"Changed Type" = Table.TransformColumnTypes(Table00X,{{"Column1", type text}, {"Column2", type text}}),
    #"Transposed Table" = Table.Transpose(#"Changed Type"),
    #"Promoted Headers" = Table.PromoteHeaders(#"Transposed Table", [PromoteAllScalars=true])
in
    #"Promoted Headers"

This works great for an individual page in my PDF file, but I don't really want to have to copy/paste this query for every page and tweak Id="Table007" to Id="Table008" all the way to Id="Table265" in order to iterate through each page.

I would like to run this/similar query for each "Table" in my PDF and merge the output into a single result set.

Is this possible? Anyone done something similar?

I have multiple PDFs, with multiple pages and they get updated frequently, so it would be a life saver if I could use Power Query to import my data!

Im thinking there could be a foreach/loop function, or maybe a way to parameterize it?


Solution

  • Lets say you have a PDF where each page has a table and text

    enter image description here

    When you read that in PowerQuery you get

    enter image description here

    We can filter to Kind=Table, and for all the results, apply a function that does a transpose and promote:

    let  Source = Pdf.Tables(File.Contents("C:\Temp\a.pdf"), [Implementation="1.3"]),
    
    Process = (x as table) as table =>
        let  #"Transposed Table" = Table.Transpose(x),
        #"Promoted Headers" = Table.PromoteHeaders(#"Transposed Table", [PromoteAllScalars=true])
    in #"Promoted Headers" ,
    
    #"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Table")),
    #"Added Custom" = Table.AddColumn(#"Filtered Rows", "Custom", each Process([Data])),
    ColumnsToExpand = List.Distinct(List.Combine(List.Transform(Table.Column(#"Added Custom", "Custom"), each if _ is table then Table.ColumnNames(_) else {}))),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", ColumnsToExpand , ColumnsToExpand ),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Custom",{"Id", "Kind", "Data"})
    in  #"Removed Columns"
    

    thus generating

    enter image description here