I am trying to use the AbcPdf .net component (version 7) to process some PDFs and generate metadata. I was wondering if there is anyway to list all the tags in a pdf document? As an example of a tagged pdf, I am using this file here
Are there any other components or tools available for listing or extracting pdf tags?
Thanks in advance for you help
Use iTextSharp. It's free and you only need the "itextsharp.dll".
Here is a simple function for reading the text out of a PDF.
Public Shared Function GetTextFromPDF(PdfFileName As String) As String
Dim oReader As New iTextSharp.text.pdf.PdfReader(PdfFileName)
Dim sOut = ""
For i = 1 To oReader.NumberOfPages
Dim its As New iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy
sOut &= iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(oReader, i, its)
Return sOut
End Function
ITextSharp also has methods for dealing with tags.