Search code examples
c#.netvb.netitextitext7

Is iText7 available in VB.Net or only C#


I want to extract text fields content from pdf files which have text fields that I need to bring into my Winforms project. Searching I found reference to iTextSharp but then saw that it is replaced with iText7 but everything I read refers only to it being used in C#. My winforms project is vb. Any pointers as to what would be my best option to achieve getting that data into my project is much appreciated


Solution

  • To extract text from a PDF file using itext7, try the following:

    Pre-requisite: Download/install NuGet package itext7

    Add the following Imports statements:

    Imports iText.Kernel.Pdf
    Imports iText.Kernel.Pdf.Canvas.Parser.Listener
    Imports iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor
    

    GetTextFromPdf:

    Public Function GetTextFromPdf(filename As String) As String
        Dim sb As System.Text.StringBuilder = New System.Text.StringBuilder()
    
        Using doc As PdfDocument = New PdfDocument(New PdfReader(filename))
            'Dim strategy As LocationTextExtractionStrategy = New LocationTextExtractionStrategy()
    
            For i As Integer = 1 To doc.GetNumberOfPages() Step 1
                Dim page = doc.GetPage(i)
                'Dim text = iText.Kernel.Pdf.Canvas.Parser.PdfTextExtractor.GetTextFromPage(page, strategy)
                Dim text = GetTextFromPage(page)
                sb.AppendLine(text)
            Next
        End Using
    
        Return sb.ToString()
    End Function
    

    The code for GetTextFromPdf is adapted from here.

    Update:

    The code below shows how to read the field names and field values from an AcroForm in a Pdf document:

    Add the following Imports statements:

    Imports iText.Forms
    Imports iText.Kernel.Pdf
    

    GetTextFromPdfFields

    Public Function GetTextFromPdfFields(filename As String) As String
        Dim sb As System.Text.StringBuilder = New System.Text.StringBuilder()
    
        'create new instance
        Using doc As PdfDocument = New PdfDocument(New PdfReader(filename))
    
            'get AcroForm from document
            Dim form As PdfAcroForm = PdfAcroForm.GetAcroForm(doc, True)
    
            'get form fields
            Dim fieldDict As IDictionary(Of String, Fields.PdfFormField) = form.GetFormFields()
    
            'loop through form fields
            For Each kvp As KeyValuePair(Of String, Fields.PdfFormField) In fieldDict
                Dim type As PdfName = form.GetField(kvp.Key).GetFormType()
                Dim fieldName As PdfString = form.GetField(kvp.Key).GetFieldName()
                Dim fieldValue As String = form.GetField(kvp.Key).GetValueAsString()
    
                If fieldName IsNot Nothing Then
                    'append data to instance of StringBuilder
                    sb.AppendLine("Type: " & type.ToString() & " FieldName: " & fieldName.ToString() & " Value: " & fieldValue)
                End If
            Next
        End Using
    
        Return sb.ToString()
    End Function
    

    **Note: The code for GetTextFromPdfFields is adapted from here.