Good morning. I am hoping someone can help me on this topic. Last year I set up a VB.NET program using iTextSharp
where a user could enter the information to fill the I9 and that information would fill in the PDF and print. With the new I9 I am having unidentified difficulties.
First, the code doesn't error out or anything. I simply get a poor result, as instead of a filled form I get a PDF that says "The document you are trying to load requires Adobe Reader 8 or higher. You may not have the Adobe Reader installed..." etc. So, I made sure that I have the most recent Reader version, tried again and same result.
Thinking that perhaps there were changes in the field name structure, I attempted to read in the format/fields as I had the first time around. (Code below). However, now it tells me that there're no fields to read (AcroFields.Fields.Count = 0
).
Private Sub ListFieldNames(pdfTemplate As String)
Dim pdfTemplate As String = "c:\Temp\PDF\fw4.pdf"
Dim pdfReader As PdfReader = New PdfReader(pdfTemplate)
Dim de As KeyValuePair(Of String, iTextSharp.text.pdf.AcroFields.Item)
For Each de In pdfReader.AcroFields.Fields
Console.WriteLine(de.Key.ToString())
Next
End Sub
So, I started doing some searching and found reference to another type of PDF structure that they could have switched to; XFA. I honestly still haven't found any satisfactory documentation/samples of this, but I did find some code that seems like it should work to read in the structure of an XFA PDF. (Code below). There're actually 2 different methods here that I tried. The first essentially shows that there're no xmlNodes in xfaFields. The second does find a node called "data" (that's the only one it finds) but doesn't find any child nodes.
Private Sub ReadXfa(pdfTemplate As String)
pdfReader.unethicalreading = True
Dim readerPDF As New PdfReader(pdfTemplate)
Dim xfaFields = readerPDF.AcroFields.Xfa.DatasetsSom.Name2Node
For Each xmlNode In xfaFields
Console.WriteLine(xmlNode.Value.Name + ":" + xmlNode.Value.InnerText)
Next
'Example of how to get a field value
' Dim lastName = xfaFields.First(Function(a) a.Value.Name = "textFieldLastNameGlobal").Value.InnerText
Dim reader As New PdfReader(pdfTemplate)
Dim xfa As New XfaForm(reader)
Dim node As XmlNode = xfa.DatasetsNode()
Dim list As XmlNodeList = node.ChildNodes()
For i As Integer = 0 To list.Count - 1
Console.WriteLine(list.Item(i).LocalName())
If "data".Equals(list.Item(i).LocalName()) Then
node = list.Item(i)
Exit For
End If
Next
list = node.ChildNodes()
For i As Integer = 0 To list.Count - 1
Console.WriteLine(list.Item(i).LocalName())
Next
reader.Close()
End Sub
https://www.uscis.gov/system/files_force/files/form/i-9.pdf?download=1
The above link goes to the i9 PDF provided by the government.
SO...I guess I have multiple questions. The simplest is if anybody has done this process/if they can help me. Barring that, if someone could point me in the right direction regarding how to read/write from this new PDF file, that would be stupendous. I'm frankly not even certain how to determine what "type" of form they used - AcroField
, XFA
, something else?
Thank you so much for your time/help!
First, sorry I don't do vb.net anymore, but you should be able to convert the code that follows.
You already found out for yourself that the new form is XFA. There's an easy non-programmatic way to see the form fields and data. You noted that you upgraded your version of Adobe Reader, so am guessing you're using Reader DC. From the menu options:
Edit => Form Options => Export Data...
That exports the form to a XML
file you can inspect. The XML file gives you a hint that a corresponding XML document is needed to fill the form, which is quite different than how it's done with an AcroForm.
Here's some simple code to get you started. First a method to read the blank XML document and update it:
public string FillXml(Dictionary<string, string> fields)
{
// XML_INFILE => physical path to XML file exported from I-9
XDocument xDoc = XDocument.Load(XML_INFILE);
foreach (var kvp in fields)
{
// handle multiple elements in I-9 form
var elements = xDoc.XPathSelectElements(
string.Format("//{0}", kvp.Key)
);
if (elements.Count() > 0)
{
foreach (var e in elements) { e.Value = kvp.Value; }
}
}
return xDoc.ToString();
}
Now that we have a method to create valid XML, fill the form fields with some sample data:
var fields = new Dictionary<string, string>()
{
{ "textFieldLastNameGlobal", "Doe" },
{ "textFieldFirstNameGlobal", "Jane" }
};
var filledXml = FillXml(fields);
using (var ms = new MemoryStream())
{
// PDF_READER => I-9 PdfReader instance
using (PDF_READER)
{
// I-9 has password security
PdfReader.unethicalreading = true;
// maintain usage rights on output file
using (var stamper = new PdfStamper(PDF_READER, ms, '\0', true))
{
XmlDocument doc = new XmlDocument();
doc.LoadXml(filledXml);
stamper.AcroFields.Xfa.FillXfaForm(doc.DocumentElement);
}
}
File.WriteAllBytes(OUTFILE, ms.ToArray());
}
To answer your last question, how to determine the form 'type', use the PdfReader
instance like so:
PDF_READER.AcroFields.Xfa.XfaPresent
true
means XFA, false
means AcroForm.