Search code examples
xmlvb.netpdfitextxfa

Auto-Fill I-9 PDF XFA Form


Good morning. I am hoping someone can help me on this topic. Last year I set up a VB.NET program using iTextSharp where a user could enter the information to fill the I9 and that information would fill in the PDF and print. With the new I9 I am having unidentified difficulties.

First, the code doesn't error out or anything. I simply get a poor result, as instead of a filled form I get a PDF that says "The document you are trying to load requires Adobe Reader 8 or higher. You may not have the Adobe Reader installed..." etc. So, I made sure that I have the most recent Reader version, tried again and same result.

Thinking that perhaps there were changes in the field name structure, I attempted to read in the format/fields as I had the first time around. (Code below). However, now it tells me that there're no fields to read (AcroFields.Fields.Count = 0).

Private Sub ListFieldNames(pdfTemplate As String)
    Dim pdfTemplate As String = "c:\Temp\PDF\fw4.pdf"
    Dim pdfReader As PdfReader = New PdfReader(pdfTemplate)
    Dim de As KeyValuePair(Of String, iTextSharp.text.pdf.AcroFields.Item)

    For Each de In pdfReader.AcroFields.Fields
        Console.WriteLine(de.Key.ToString())
    Next
End Sub

So, I started doing some searching and found reference to another type of PDF structure that they could have switched to; XFA. I honestly still haven't found any satisfactory documentation/samples of this, but I did find some code that seems like it should work to read in the structure of an XFA PDF. (Code below). There're actually 2 different methods here that I tried. The first essentially shows that there're no xmlNodes in xfaFields. The second does find a node called "data" (that's the only one it finds) but doesn't find any child nodes.

Private Sub ReadXfa(pdfTemplate As String)
    pdfReader.unethicalreading = True
    Dim readerPDF As New PdfReader(pdfTemplate)

    Dim xfaFields = readerPDF.AcroFields.Xfa.DatasetsSom.Name2Node

    For Each xmlNode In xfaFields
        Console.WriteLine(xmlNode.Value.Name + ":" + xmlNode.Value.InnerText)
    Next
    'Example of how to get a field value
    '   Dim lastName = xfaFields.First(Function(a) a.Value.Name = "textFieldLastNameGlobal").Value.InnerText


    Dim reader As New PdfReader(pdfTemplate)
    Dim xfa As New XfaForm(reader)
    Dim node As XmlNode = xfa.DatasetsNode()
    Dim list As XmlNodeList = node.ChildNodes()
    For i As Integer = 0 To list.Count - 1
        Console.WriteLine(list.Item(i).LocalName())
        If "data".Equals(list.Item(i).LocalName()) Then
            node = list.Item(i)
            Exit For
        End If
    Next
    list = node.ChildNodes()
    For i As Integer = 0 To list.Count - 1
        Console.WriteLine(list.Item(i).LocalName())
    Next
    reader.Close()
End Sub

https://www.uscis.gov/system/files_force/files/form/i-9.pdf?download=1

The above link goes to the i9 PDF provided by the government.

SO...I guess I have multiple questions. The simplest is if anybody has done this process/if they can help me. Barring that, if someone could point me in the right direction regarding how to read/write from this new PDF file, that would be stupendous. I'm frankly not even certain how to determine what "type" of form they used - AcroField, XFA, something else?

Thank you so much for your time/help!


Solution

  • First, sorry I don't do vb.net anymore, but you should be able to convert the code that follows.

    You already found out for yourself that the new form is XFA. There's an easy non-programmatic way to see the form fields and data. You noted that you upgraded your version of Adobe Reader, so am guessing you're using Reader DC. From the menu options:

    Edit => Form Options => Export Data...
    

    That exports the form to a XML file you can inspect. The XML file gives you a hint that a corresponding XML document is needed to fill the form, which is quite different than how it's done with an AcroForm.

    Here's some simple code to get you started. First a method to read the blank XML document and update it:

    public string FillXml(Dictionary<string, string> fields)
    {
        // XML_INFILE => physical path to XML file exported from I-9
        XDocument xDoc = XDocument.Load(XML_INFILE);
        foreach (var kvp in fields)
        {
            // handle multiple elements in I-9 form
            var elements = xDoc.XPathSelectElements(
                string.Format("//{0}", kvp.Key)
            );
            if (elements.Count() > 0)
            {
                foreach (var e in elements) { e.Value = kvp.Value; }
            }
        }
    
        return xDoc.ToString();
    }
    

    Now that we have a method to create valid XML, fill the form fields with some sample data:

    var fields = new Dictionary<string, string>()
    {
        { "textFieldLastNameGlobal", "Doe" },
        { "textFieldFirstNameGlobal", "Jane" }
    };
    var filledXml = FillXml(fields);
    
    using (var ms = new MemoryStream())
    {
        // PDF_READER => I-9 PdfReader instance
        using (PDF_READER)
        {
            // I-9 has password security
            PdfReader.unethicalreading = true;
            // maintain usage rights on output file
            using (var stamper = new PdfStamper(PDF_READER, ms, '\0', true))
            {
                XmlDocument doc = new XmlDocument();
                doc.LoadXml(filledXml);
                stamper.AcroFields.Xfa.FillXfaForm(doc.DocumentElement);
            }
        }
        File.WriteAllBytes(OUTFILE, ms.ToArray());
    }
    

    To answer your last question, how to determine the form 'type', use the PdfReader instance like so:

    PDF_READER.AcroFields.Xfa.XfaPresent
    

    true means XFA, false means AcroForm.