I have a PDF file with 250+ pages in it. Each page has a table which is the same format on all pages. The data is different but the structure is the same.
let
Source = Pdf.Tables(File.Contents("C:\path\to\file.pdf"), [Implementation="1.3"]),
Table00X = Source{[Id="Table007"]}[Data],
#"Changed Type" = Table.TransformColumnTypes(Table00X,{{"Column1", type text}, {"Column2", type text}}),
#"Transposed Table" = Table.Transpose(#"Changed Type"),
#"Promoted Headers" = Table.PromoteHeaders(#"Transposed Table", [PromoteAllScalars=true])
in
#"Promoted Headers"
This works great for an individual page in my PDF file, but I don't really want to have to copy/paste this query for every page and tweak Id="Table007" to Id="Table008" all the way to Id="Table265" in order to iterate through each page.
I would like to run this/similar query for each "Table" in my PDF and merge the output into a single result set.
Is this possible? Anyone done something similar?
I have multiple PDFs, with multiple pages and they get updated frequently, so it would be a life saver if I could use Power Query to import my data!
Im thinking there could be a foreach/loop function, or maybe a way to parameterize it?
Lets say you have a PDF where each page has a table and text
When you read that in PowerQuery you get
We can filter to Kind=Table, and for all the results, apply a function that does a transpose and promote:
let Source = Pdf.Tables(File.Contents("C:\Temp\a.pdf"), [Implementation="1.3"]),
Process = (x as table) as table =>
let #"Transposed Table" = Table.Transpose(x),
#"Promoted Headers" = Table.PromoteHeaders(#"Transposed Table", [PromoteAllScalars=true])
in #"Promoted Headers" ,
#"Filtered Rows" = Table.SelectRows(Source, each ([Kind] = "Table")),
#"Added Custom" = Table.AddColumn(#"Filtered Rows", "Custom", each Process([Data])),
ColumnsToExpand = List.Distinct(List.Combine(List.Transform(Table.Column(#"Added Custom", "Custom"), each if _ is table then Table.ColumnNames(_) else {}))),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", ColumnsToExpand , ColumnsToExpand ),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Custom",{"Id", "Kind", "Data"})
in #"Removed Columns"
thus generating