Search code examples
vbaexcelpdfacrobat

Not seeing PDF fields from Excel VBA


I've been tasked with creating an Excel tool to import data from thousands of PDFs into Excel. I've installed Adobe Acrobat X Professional, and it seems to be working fine. I'm using Excel 2007.

I found sample code here: http://khkonsulting.com/2010/09/reading-pdf-form-fields-with-vba/

And modifying it only slightly to prompt me to browse to the PDF file, and setting a reference to "Adobe Acrobat 10.0 Type Library", I'm trying this code:

Sub Main()
Dim AcroApp As Acrobat.CAcroApp
Dim theForm As Acrobat.CAcroPDDoc
Dim jso As Object
Dim text1, text2 As String

Set AcroApp = CreateObject("AcroExch.App")
Set theForm = CreateObject("AcroExch.PDDoc")

Dim strFullPath As String
strFullPath = Application.GetOpenFilename()
theForm.Open (strFullPath) 'theForm.Open ("C:\temp\sampleForm.pdf")

Set jso = theForm.GetJSObject

' get the information from the form fields Text1 and Text2
text1 = jso.getfield("MFR_ctrl33605579").Value 'jso.getfield("Text1").Value
'(etc)
End Sub

("MFR_ctrl33605579" is the name of one of the text fields; I found that in the editor that is part of or included with Acrobat Pro, called "Adobe LiveCycle Designer".)

It runs without error until it gets to the getfield method. Then I get the error "Object required".

If I run TypeName on AcroApp, theForm, and jso, I get CAcroApp, CAcroPDDoc, and object respectively:

?typename(AcroApp)
CAcroApp
?typename(theForm)
CAcroPDDoc
?typename(jso)
Object

If I put them in the Watches window, I initially get plus signs to the left of all three of them, but if I click on those plus signs, only one line appears under each one, saying "< No Variables >" in the Value column.

I wondered if the document was somehow locked against being read in this way. I read that if it was locked, a padlock would appear at the upper left in Acrobat, and it would not allow saving as text. But no padlock appears that I see, and it does allow saving as plain text. So as far as I can tell it doesn't seem to be locked.

Any suggestions on what I can try?

UPDATE:

I'm just getting started trying the suggestions, but I wanted to note that after trying rheitzman's For loop to get the field name (using getNthFieldName), I see that my code works if I use this for the field name:

form1[0].QuestionnaireForm[0].sbfrmProfile[0].sbfrmContact[0].sbfrmManufacturerDetails[0].MFR_ctrl33605579[0]

Or in other words:

text1 = jso.getfield("MFR_form1[0].QuestionnaireForm[0].sbfrmProfile[0].sbfrmContact[0].sbfrmManufacturerDetails[0].MFR_ctrl33605579[0]").Value

That would allow me to identify fields by that very long, apparently fully-qualified reference, which would get me through the project. But first I'm going to check the other ideas to see if the fields can be found by short name only.

UPDATE 2:

I see now that I can inspect some details of some of the object model in the Object Browser (filter Libraries on Acrobat), even though it doesn't show up in the Watch window.

However, it doesn't show objects created by methods, such as the object created by the AcroExch.PDDoc.GetJSObject method (see AcroPDDoc in the object model). The method of that object used here, getNthFieldName, does not appear in the object browser at all.

And, I don't see any other way there of identifying the fields by their short field names.

So ... for this project, I'm just going to use the long field names returned by getNthFieldName.


Solution

  • iTextSharp is much easier to use!

    That said, here's a snippet to read all fields.

    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
        Dim pdfForm As Acrobat.AcroPDDoc
        Dim jso As Object ' ?
        Dim s As String = ""
        Dim i As Integer
        Dim strFullPath As String = "H:\OIS\ENFORCE\OPEN_BURN\Ag\temp\Open_Burn_Template_Out.pdf"
        Try
            pdfForm = New Acrobat.AcroPDDoc
            pdfForm.Open(strFullPath)
            jso = pdfForm.GetJSObject
            For i = 0 To jso.numfields() - 1
                s = jso.getNthFieldName(i)
                Debug.Print(s & ": " & jso.getField(s).value)
            Next
        Catch ex As Exception
            MsgBox(ex.Message)
        End Try
    End Sub
    

    It could be your "form" doesn't have any fields (or a flattened form), or the names are off.

    The JSObject properties are a bit of a mystery. I found a post that used reflection to extract data but it just so happens that the JSObject properties are usable if you know what they are! e.g. numfields, getNthFieldName

    If anyone finds a reference link for JSObject please post in comments.