Is there a way to find the source from which any PDF has been built ? The reason being if it has been built from an XML then i want to know whether i can get the XML back from the PDF and parse the XML ?
Is there a way to find the source from which any PDF has been built ?
No, there is no way to do that. PDF is a presentation format, not a data storage format, and in general, PDF lacks structure of the data it presents. You may not even have words or phrases inside a PDF file. A PDF could be considered (in an utterly-over-simplified way) a sequence of instructions like:
-Draw character 'a' in coordinates 10, 30
-Move the pen to the point 40,40
-Draw a line from the current point to the point 50,50
...