I'm currently working with the new German ZUGFeRD files. These are PDF A/3 files who have an embedded XML file in them which contains data.
I want to extract this XML file from the PDF A/3 using abcpdf 8.1 with C#.
Any idea how to do this ?
Thanks a lot and regards,
I don't know abcpdf but I guess that the pdf libs offer similar access to the pdfs content.
First take a look at Das-ZUGFeRD-Format_1p0.pdf. Especially page 112. The images shows the object tree you have to traverse in order to find the xml stream.
With this tree you have the names, the types and the direction. Now you can traverse the pdf object tree to get to the XML content that you are looking for.
The steps based on the diagram.
AF
from CatalogAF
array (should be file spec
)file spec
get the dictionary named EF
EF
This are the steps you need to perform in order to get to the content.
To display the structure of a pdf and browse the tree I would recommend to use a tool like iText RUPS