Search code examples
xmldelphipdfacrobat

Export PDF as XML and fill the tags with Delphi


I cant get trough this without some help.

I'm trying to export a PDF document to XML/XDP using the correct menu in Acrobat Reader.

I can see all my fields in the document in Acrobat Reader and can fill them by code with Delphi except 3 tags.

The problem comes when I try to export the fields' contents to XML and is that the XML does not contain three of the field tags it ought to contain. They are simply not in the XML generated by Acrobat Reader but as I've said the fields are for sure usable and not hidden in the PDF form itself.

For example :

Name : <Name></Name>

First name : <Firstname></Firstname>

Date : Missing in XML file

What could explain the fact that the three fields are not exported to XML whereas all the rest are, and how could I investigate what the cause of the difference is?

I hope I gave you enough informations to try to help me.


Solution

  • I hesitate to post this as an answer, because it will not directly answer your q but it may point you in the right direction to do some self-help.

    Unless your PDF file is corrupt, the only rational explanation for your 3 problem fields, and it can only really be something to do with their attributes stored in the PDF file. As you'll see from the extract from the Acrobat Formms interface below, each field has a large number of possible attributes, and I'm confident that you should be able to identify the difference(s) which account for your problem field's different behaviour.

    AFORMAUTLib_TLB.Pas is the import unit I generated from the Forms plug-in for Acrobat8 (which is a bit old now, but I don't think that matters). The extract from it shows the interface for an Acrobat IField and the umpteen properties/attributes which can be set for a field in an Acrobat form.

    So, in your position, if I absolutely had to use the XML generated by Reader, what I would do would be to write some code using the objects in AFORMAUTLib_TLB to dump the properties of the form's fields and see if I could identify whatever differences there are between your three problem fields and the rest. Working with the objects in AFORMAUTLib_TLB is very straightforward - basically, there is a FormApp object which allows you to open an Acrobat Form and provides access to its IFields collection, which contains an IField instance for each field defined on the form. All this is thoroughly documented in the Acrobat SDK available from Adobe's site.

    If you spend an hour or two playing with the Forms interfaces, I'm fairly sure you'll end up being tempted to avoid using Reader's XML output and simply generate your own from the IField objects in the form. That's assuming you actually need the XML at all of course.

    I imagine - but don't know for sure (and don't want to install Reader to find out) is whether Acrobat Reader has the same plug-in for handling for fields. Obviously, if it doesn't, you are out of luck with this approach.

    Good luck!

    PS: Once you have an IField interface to a field of interest, you can tweak its attributes and contents at run-time, so if you can find the difference(s) which is causing the problem, it could be very straightforward to apply a run-time fix.

    Also btw the interface objects in the import unit may it easy to turn a plain PDF document into a form and optionally fill it in, it that's what's you need. I'm not sure whether anything has been done to inhibit this functionality in Reader, though - if you get into doing a lot of forms work, a copy of the full version of Acrobat is pretty much indispensable.

    Update: I'm not sure whether the facility Acrobat 8 has for exporting form data (under Forms | Manage Form Data | Export Data | save as type : XML) is functionally identical to what you are using in Reader, but simple observation of its behaviour is that if, at the time the Export Data function, a field is empty, no XML tag for it is included in the exported XML. This is regardless of whether the form has been saved to disk since the field was emptied. So, if that is the case with your form, a possible work-around would be to temporarily set the field's value to something non-empty, export the PDF to XML and then abndon the change.

    // Type Lib: D:\Program Files\Adobe\Acrobat 8.0\Acrobat\plug_ins\AcroForm.api (1)
    // IID\LCID: {7CD06992-50AA-11D1-B8F0-00A0C9259304}\0
    
    // *********************************************************************//
    // Interface: IField
    // Flags:     (4416) Dual OleAutomation Dispatchable
    // GUID:      {673E8454-7646-11D1-B90B-00A0C9259304}
    // *********************************************************************//
      IField = interface(IDispatch)
        ['{673E8454-7646-11D1-B90B-00A0C9259304}']
        function  Get_Name: WideString; safecall;
        function  Get_Value: WideString; safecall;
        procedure Set_Value(const pbstrVal: WideString); safecall;
        function  Get_IsHidden: WordBool; safecall;
        procedure Set_IsHidden(pIsHidden: WordBool); safecall;
        function  Get_IsTerminal: WordBool; safecall;
        function  Get_Type_: WideString; safecall;
        function  Get_IsReadOnly: WordBool; safecall;
        procedure Set_IsReadOnly(pIsRO: WordBool); safecall;
        function  Get_IsRequired: WordBool; safecall;
        procedure Set_IsRequired(pIsReqd: WordBool); safecall;
        function  Get_PrintFlag: WordBool; safecall;
        procedure Set_PrintFlag(pIsPrint: WordBool); safecall;
        procedure SetBorderColor(const bstrColorSpace: WideString; GorRorC: Single; GorM: Single;
                                 BorY: Single; K: Single); safecall;
        procedure SetBackgroundColor(const bstrColorSpace: WideString; GorRorC: Single; GorM: Single;
                                     BorY: Single; K: Single); safecall;
        function  Get_BorderWidth: Smallint; safecall;
        procedure Set_BorderWidth(pVal: Smallint); safecall;
        function  Get_Alignment: WideString; safecall;
        procedure Set_Alignment(const pVal: WideString); safecall;
        function  Get_CharLimit: Smallint; safecall;
        procedure Set_CharLimit(pVal: Smallint); safecall;
        function  Get_DefaultValue: WideString; safecall;
        procedure Set_DefaultValue(const pVal: WideString); safecall;
        function  Get_IsMultiline: WordBool; safecall;
        procedure Set_IsMultiline(pVal: WordBool); safecall;
        function  Get_IsPassword: WordBool; safecall;
        procedure Set_IsPassword(pVal: WordBool); safecall;
        procedure SetExportValues(arrExportVal: OleVariant); safecall;
        procedure SetJavaScriptAction(const bstrTrigger: WideString; const bstrTheScript: WideString); safecall;
        procedure SetSubmitFormAction(const bstrTrigger: WideString; const bstrTheURL: WideString;
                                      theFlags: Integer; arrFields: OleVariant); safecall;
        procedure SetResetFormAction(const bstrTrigger: WideString; theFlags: Integer;
                                     arrFields: OleVariant); safecall;
        procedure SetButtonIcon(const bstrFace: WideString; const bstrFullPath: WideString;
                                pageNum: Smallint); safecall;
        function  Get_CalcOrderIndex: Smallint; safecall;
        procedure Set_CalcOrderIndex(pVal: Smallint); safecall;
        function  Get_BorderStyle: WideString; safecall;
        procedure Set_BorderStyle(const pVal: WideString); safecall;
        procedure SetForegroundColor(const bstrColorSpace: WideString; GorRorC: Single; GorM: Single;
                                     BorY: Single; K: Single); safecall;
        procedure PopulateListOrComboBox(arrItems: OleVariant; arrExportVal: OleVariant); safecall;
        function  Get_Editable: WordBool; safecall;
        procedure Set_Editable(pVal: WordBool); safecall;
        function  Get_Highlight: WideString; safecall;
        procedure Set_Highlight(const pVal: WideString); safecall;
        function  Get_Style: WideString; safecall;
        procedure Set_Style(const pVal: WideString); safecall;
        function  Get_TextFont: WideString; safecall;
        procedure Set_TextFont(const pVal: WideString); safecall;
        function  Get_TextSize: Smallint; safecall;
        procedure Set_TextSize(pVal: Smallint); safecall;
        procedure SetButtonCaption(const bstrFace: WideString; const bstrCaption: WideString); safecall;
        function  Get_ButtonLayout: Smallint; safecall;
        procedure Set_ButtonLayout(pVal: Smallint); safecall;
        function  Get_NoViewFlag: WordBool; safecall;
        procedure Set_NoViewFlag(pVal: WordBool); safecall;
        property Name: WideString read Get_Name;
        property Value: WideString read Get_Value write Set_Value;
        property IsHidden: WordBool read Get_IsHidden write Set_IsHidden;
        property IsTerminal: WordBool read Get_IsTerminal;
        property Type_: WideString read Get_Type_;
        property IsReadOnly: WordBool read Get_IsReadOnly write Set_IsReadOnly;
        property IsRequired: WordBool read Get_IsRequired write Set_IsRequired;
        property PrintFlag: WordBool read Get_PrintFlag write Set_PrintFlag;
        property BorderWidth: Smallint read Get_BorderWidth write Set_BorderWidth;
        property Alignment: WideString read Get_Alignment write Set_Alignment;
        property CharLimit: Smallint read Get_CharLimit write Set_CharLimit;
        property DefaultValue: WideString read Get_DefaultValue write Set_DefaultValue;
        property IsMultiline: WordBool read Get_IsMultiline write Set_IsMultiline;
        property IsPassword: WordBool read Get_IsPassword write Set_IsPassword;
        property CalcOrderIndex: Smallint read Get_CalcOrderIndex write Set_CalcOrderIndex;
        property BorderStyle: WideString read Get_BorderStyle write Set_BorderStyle;
        property Editable: WordBool read Get_Editable write Set_Editable;
        property Highlight: WideString read Get_Highlight write Set_Highlight;
        property Style: WideString read Get_Style write Set_Style;
        property TextFont: WideString read Get_TextFont write Set_TextFont;
        property TextSize: Smallint read Get_TextSize write Set_TextSize;
        property ButtonLayout: Smallint read Get_ButtonLayout write Set_ButtonLayout;
        property NoViewFlag: WordBool read Get_NoViewFlag write Set_NoViewFlag;
      end;