Search code examples
jasper-reportsmultilingualexport-to-pdftagged-pdf

How to produce a bilingual tagged-PDF output from Jaspersoft / JRXML?


We're using Jaspersoft iReport Designer to create bilingual PDF outputs—each file contains both English and French text.

For accessibility reasons, we'd like to tag each block of text with its appropriate language in the resulting PDF. See PDF19: Specifying the language for a passage or phrase with the Lang entry in PDF documents for what we're trying to do.

Modifying the PDF files manually is not an option since we email them directly to our users.

Does Jaspersoft support this?


Solution

  • No JasperReports version<=6.7.0 do not support this, there is no available property in Configuration Reference to set Lang property to single textElement.

    You have 2 options:

    1. Post elaborate the pdf with for example iText as Dave Jarvis suggested. You can either try to change the dictionary or recreate the pdf adding this additional info. These methods are both fairly complex and naturally runtime will increase since you will need to read/recreate the pdf.

    2. Modify source code of JasperReport to add support. Directly modify the JRPdfExporter, JRPdfExporterTagHelper or add your new exporter (to keep original library intact)

    In this answer I will show you how you can modify the original lib adding additional tags (adding a LANG entry in the dictionary)

    Background

    The example in PDF19: Specifying the language for a passage or phrase with the Lang entry in PDF documents, show this PDF Object tree using iText RUPS.

    example1

    I will assume that it is sufficient to add /Lang in our output related to the specific text that is not in default language of pdf. Note: If you need to add other entry as well the technique remains the same you just need to modify the below code example.

    Source code modification

    Add a new property net.sf.jasperreports.export.pdf.tag.lang if this is present on reportElement in a type text field add a /Lang entry with its value to dictionary.

    Modification to JRPdfExporterTagHelper.java

    Add static property identifier to follow code style

    public static final String PROPERTY_TAG_LANG = JRPdfExporter.PDF_EXPORTER_PROPERTIES_PREFIX + "tag.lang";
    

    Modify startText(boolean isHyperLink) and startText(String text, boolean isHyperlink), only first method is shown in this example (principal is same in second), we need to change method signature adding JRPrintText so that we can retrive properties.

    protected void startText(JRPrintText text, boolean isHyperlink)
        {
            if (isTagged)
            {
    
                PdfStructureElement textTag = new PdfStructureElement(tagStack.peek(), isHyperlink ? PdfName.LINK : PdfName.TEXT);
                if (text.hasProperties()&&text.getPropertiesMap().containsProperty(PROPERTY_TAG_LANG)){
                    textTag.put(PdfName.LANG, new PdfString(text.getPropertiesMap().getProperty(PROPERTY_TAG_LANG)));
                }
                pdfContentByte.beginMarkedContentSequence(textTag);
            }
        }
    

    Since we change method signature we now need to modify JRPdfExporter.java so that we can recompile

    Modify exportText(JRPrintText text)

    ...
    if (glyphRendererAddActualText && textRenderer instanceof PdfGlyphRenderer)
    {
        tagHelper.startText(text,styledText.getText(), text.getLinkType() != null);
    }
    else
    {
        tagHelper.startText(text,text.getLinkType() != null);
    }
    ...
    

    You could remove the boolean text.getLinkType() != null, since we are actually passing the text object now, but I wanted to keep similar code for simplicity of example

    Example

    jrxml

    <?xml version="1.0" encoding="UTF-8"?>
    <jasperReport xmlns="http://jasperreports.sourceforge.net/jasperreports" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://jasperreports.sourceforge.net/jasperreports http://jasperreports.sourceforge.net/xsd/jasperreport.xsd" name="TaggedPdf" pageWidth="595" pageHeight="842" columnWidth="555" leftMargin="20" rightMargin="20" topMargin="20" bottomMargin="20" uuid="1be2df3d-cbc1-467c-8729-1ed569eb8a0d">
        <property name="net.sf.jasperreports.export.pdf.tagged" value="true"/>
        <property name="net.sf.jasperreports.export.pdf.tag.language" value="EN-US"/>
        <property name="com.jaspersoft.studio.data.defaultdataadapter" value="One Empty Record"/>
        <queryString>
            <![CDATA[]]>
        </queryString>
        <title>
            <band height="67" splitType="Stretch">
                <staticText>
                    <reportElement x="0" y="0" width="240" height="30" uuid="0722eadc-3fd6-4c4d-811c-64fbd18e0af5"/>
                    <textElement verticalAlignment="Middle"/>
                    <text><![CDATA[Hello world]]></text>
                </staticText>
                <staticText>
                    <reportElement x="0" y="30" width="240" height="30" uuid="5080190e-e9fd-4df6-b0f6-f1be3c109805">
                        <property name="net.sf.jasperreports.export.pdf.tag.lang" value="FR"/>
                    </reportElement>
                    <textElement verticalAlignment="Middle"/>
                    <text><![CDATA[Bonjour monde]]></text>
                </staticText>
            </band>
        </title>
    </jasperReport>
    

    Exported to pdf with modifications above and visualized with iText RUPS

    Final output

    Is this enough according to PDF19: Specifying the language for a passage or phrase with the Lang entry in PDF documents

    Verify that the language of a passage, phrase, or word that differs from the language of the surrounding text is correctly specified by a /Lang entry on an enclosing tag or container:

    As far as I can see, yes but I'm not an expert in this matter, in any case if you need to add other tags the procedure is the same.