Search code examples
.netpdfabcpdf

listing all the tags in a pdf document using abcpdf


I am trying to use the AbcPdf .net component (version 7) to process some PDFs and generate metadata. I was wondering if there is anyway to list all the tags in a pdf document? As an example of a tagged pdf, I am using this file here

Are there any other components or tools available for listing or extracting pdf tags?

Thanks in advance for you help


Solution

  • Use iTextSharp. It's free and you only need the "itextsharp.dll".

    http://sourceforge.net/projects/itextsharp/

    Here is a simple function for reading the text out of a PDF.

    Public Shared Function GetTextFromPDF(PdfFileName As String) As String
        Dim oReader As New iTextSharp.text.pdf.PdfReader(PdfFileName)
    
        Dim sOut = ""
    
        For i = 1 To oReader.NumberOfPages
            Dim its As New iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy
    
            sOut &= iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(oReader, i, its)
        Next
    
        Return sOut
    End Function
    

    ITextSharp also has methods for dealing with tags.