Search code examples
pdfpdf-conversion

PDF to Source conversion


Is there a way to find the source from which any PDF has been built ? The reason being if it has been built from an XML then i want to know whether i can get the XML back from the PDF and parse the XML ?


Solution

  • Is there a way to find the source from which any PDF has been built ?

    No, there is no way to do that. PDF is a presentation format, not a data storage format, and in general, PDF lacks structure of the data it presents. You may not even have words or phrases inside a PDF file. A PDF could be considered (in an utterly-over-simplified way) a sequence of instructions like:

    -Draw character 'a' in coordinates 10, 30
    -Move the pen to the point 40,40
    -Draw a line from the current point to the point 50,50
    ...