Search code examples
pdfboxrichtext

acroform field.setRichTextValue is not working


I have a field from acroform and I see field.setValue() and field.setRichTextValue(...). The first one set the correct value, but second one seems not working, rich text value is not display. Here is code im using :

PDDocument pdfDocument = PDDocument.load(new File(SRC));
            pdfDocument.getDocument().setIsXRefStream(true);
            PDAcroForm acroForm = pdfDocument.getDocumentCatalog().getAcroForm();
            acroForm.setNeedAppearances(false);

            acroForm.getField("tenantDataValue").setValue("Deuxième texte");
            acroForm.getField("tradingAddressValue").setValue("Text replacé");
            acroForm.getField("buildingDataValue").setValue("Deuxième texte");
            acroForm.getField("oldRentValue").setValue("750");
            acroForm.getField("oldChargesValue").setValue("655");
            acroForm.getField("newRentValue").setValue("415");
            acroForm.getField("newChargesValue").setValue("358");
            acroForm.getField("increaseEffectiveDateValue").setValue("Texte 3eme contenu");


            // THIS RICH TEXT NOT SHOW ANYTHING
            PDTextField field = (PDTextField) acroForm.getField("tableData");
            field.setRichText(true);
            String val = "\\rtpara[size=12]{para1}{This is 12pt font, while \\span{size=8}{this is 8pt font.} OK?}";
            field.setRichTextValue(val);

I expect field named "tableData" to be setted with rich text value!

You can download the PDF form I am using with this code : download pdf form and you can download the output after runn this code and flatten form data download output here


Solution

  • To sum up what has been said in the comments to the question plus some studies of the working version...

    Wrong rich text format

    The OP in his original code used this as rich text

    String val = "\\rtpara[size=12]{para1}{This is 12pt font, while \\span{size=8}{this is 8pt font.} OK?}";
    

    which he took from this document. But that document is the manual for the LaTeX richtext package which provides commands and documentation needed to “easily” produce such rich strings. I.e. the \rtpara... above is not PDF rich text but instead a LaTeX command that produces PDF rich text (if executed in a LaTeX context).

    The document actually even demonstrates this using the example

    \rtpara[indent=first]{para1}{Now is the time for
        \span{style={bold,italic,strikeit},color=ff0000}{J\374rgen}
        and all good men to come to the aid of \it{their}
        \bf{country}. Now is the time for \span{style=italic}
        {all good} women to do the same.}
    

    for which the instruction generates two values, a rich text value and a plain text value:

    \useRV{para1}: <p dir="ltr" style="text-indent:12pt;
        margin-top:0pt;margin-bottom:0pt;">Now is the time
        for <span style="text-decoration:line-through;
        font-weight:bold;font-style:italic;color:#ff0000;
        ">J\374rgen</span> and all good men to come to the
        aid of <i>their</i> <b>country</b>. Now is the
        time for <span style="font-style:italic;">all
        good</span> women to do the same.</p>
    \useV{para1}: Now is the time for J\374rgen and all
        good men to come to the aid of their country. Now
        is the time for all good women to do the same.
    

    As one can see in the \useRV{para1} result, PDF rich text uses (cut down) HTML markup for rich text.

    For more details please lookup the PDF specification, e.g. section 12.7.3.4 "Rich Text Strings" in the copy of ISO 32000-1 published by Adobe here

    PDFBox does not create rich text appearances

    The OP in his original code uses

    acroForm.setNeedAppearances(false);
    

    This sets a flag that claims that all form fields have appearance streams (in which the visual appearance of the respective form field plus its content are elaborated) and that these streams represent the current value of the field, so it effectively tells the next processor of the PDF that it can use these appearance streams as-is and does not need to generate them itself.

    As @Tilman quoted from the JavaDocs, though,

    /**
     * Set the fields rich text value.
     * 
     * <p>
     * Setting the rich text value will not generate the appearance
     * for the field.
     * <br>
     * You can set {@link PDAcroForm#setNeedAppearances(Boolean)} to
     * signal a conforming reader to generate the appearance stream.
     * </p>
     * 
     * Providing null as the value will remove the default style string.
     * 
     * @param richTextValue a rich text string
     */
    public void setRichTextValue(String richTextValue)
    

    So setRichTextValue does not create an appropriate appearance stream for the field. To signal the next processor of the PDF (in particular a viewer or form flattener) that it has to generate appearances, therefore, one needs to use

    acroForm.setNeedAppearances(true);
    

    Making Adobe Acrobat (Reader) generate the appearance from rich text

    When asked to generate field appearances for a rich text field, Adobe Acrobat has the choice to do so either based on the rich text value RV or the flat text value V. I did some quick checks and Adobe Acrobat appears to use these strategies:

    1. If RV is set and the value of V equals the value of RV without the rich text markup, Adobe Acrobat assumes the value of RV to be up-to-date and generates an appearance from this rich text string according to the PDF specification. Else the value of RV (if present at all) is assumed to be outdated and ignored!

    2. Otherwise, if the V value contains rich text markup, Adobe Acrobat assumes this value to be rich text and creates the appearance according to this styling.

      This is not according to the PDF specification.

      Probably some software products used to falsely put the rich text into the V value and Adobe Acrobat started to support this misuse for larger compatibility.

    3. Otherwise the V value is used as a plain string and an appearance is generated accordingly.

    This explains why the OP's original approach using only

    field.setRichTextValue(val);
    

    showed no change - the rich text value was ignored by Adobe Acrobat.

    And it also explains his observation

    then instead of setRichTextValue simply using field.setValue("<body xmlns=\"http://www.w3.org/1999/xhtml\"><p style=\"color:#FF0000;\">Red&#13;</p><p style=\"color:#1E487C;\">Blue&#13;</p></body>") works ! in acrobat reader (without flatten) the field is correctly formatted

    Be aware, though, that this is beyond the PDF specification. If you want to generate valid PDF, you have to set both RV and V and have the latter contain the plain version of the rich text of the former.

    For example use

    String val = "<?xml version=\"1.0\"?>"
            + "<body xfa:APIVersion=\"Acroform:2.7.0.0\" xfa:spec=\"2.1\" xmlns=\"http://www.w3.org/1999/xhtml\" xmlns:xfa=\"http://www.xfa.org/schema/xfa-data/1.0/\">"
            + "<p dir=\"ltr\" style=\"margin-top:0pt;margin-bottom:0pt;font-family:Helvetica;font-size:12pt\">"
            + "This is 12pt font, while "
            + "<span style=\"font-size:8pt\">this is 8pt font.</span>"
            + " OK?"
            + "</p>"
            + "</body>";
    String valClean = "This is 12pt font, while this is 8pt font. OK?";
    field.setValue(valClean);
    field.setRichTextValue(val);
    

    or

    String val = "<body xmlns=\"http://www.w3.org/1999/xhtml\"><p style=\"color:#FF0000;\">Red&#13;</p><p style=\"color:#1E487C;\">Blue&#13;</p></body>";
    String valClean = "Red\rBlue\r";
    field.setValue(valClean);
    field.setRichTextValue(val);