Search code examples
javahtmlxsltxsl-foapache-fop

PDF report with embedded HTML


We have a Java-based system that reads data from a database, merges individual data fields with preset XSL-FO tags and converts the result to PDF with Apache FOP.

In XSL-FO format it looks like this:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE Html [
<!ENTITY nbsp  "&#160;"> 
    <!-- all other entities -->
]>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format">
    <xsl:output method="xml" indent="yes" />
    <xsl:template match="/">

        <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:svg="http://www.w3.org/2000/svg" font-family="..." font-size="...">
            <fo:layout-master-set>          
                <fo:simple-page-master master-name="Letter Page" page-width="8.500in" page-height="11.000in">

                    <!-- appropriate settings -->

                </fo:simple-page-master>
            </fo:layout-master-set>
            <fo:page-sequence master-reference="Letter Page">

                <!-- some static content -->

            <fo:flow flow-name="xsl-region-body">
                    <fo:block>
                        <fo:table ...>
                            <fo:table-column ... />
                            <fo:table-body>
                                <fo:table-row>
                                    <fo:table-cell ...>
                                        <fo:block text-align="...">
                                            <fo:inline font-size="..." font-weight="...">
                                                <!-- Header / Title -->
                                            </fo:inline>
                                        </fo:block>
                                    </fo:table-cell>
                                </fo:table-row>
                            </fo:table-body>
                        </fo:table>
                    </fo:block>

                    <fo:block>

                        <fo:table ...>
                            <fo:table-column ... />
                            <fo:table-body> 
                                <fo:table-row>
                                    <fo:table-cell>
                                        <fo:block ...>
                                            <!-- Field A -->                                
                                        </fo:block>
                                    </fo:table-cell>
                                </fo:table-row>
                            </fo:table-body>
                        </fo:table>

                        <!-- Other fields in a very similar fashion as the above "Field A" -->

                    </fo:block>

                </fo:flow>      

            </fo:page-sequence>

        </fo:root>              

    </xsl:template>

</xsl:stylesheet>

Now I am looking for a way to allow some of the fields to contain static HTML-formatted content. This content will be generated by our HTML-enabled editor (something along the lines of CLEditor, CKEditor, etc.) or pasted from outside.

My plan is to follow the recipe from this JavaWorld article:

  • use JTidy to convert HTML-formatted string to proper XHTML
  • further modify xhtml2fo.xsl from Antenna House to remove all document-wide and page-wide transformations
  • apply this modified XSLT to my XHTML string (javax.xml.transform)
  • extract all the nodes under the root with XPath (javax.xml.xpath)
  • feed the result directly into existing XSL-FO document

I have a bare-bone version of such code and got the following error:

(Location of error unknown)org.apache.fop1.fo.ValidationException: "{http://www.w3.org/1999/XSL/Format}table-body" is not a valid child of "fo:block"! (No context info available)

My questions:

  1. What would be the way to troubleshoot this issue?
  2. Can <fo:block> serve as a generic container with other objects (including tables) nested inside?
  3. Is this an overall reasonable approach to solving the task?

If someone already "been there done that", please share your experience.


Solution

  • The best way to troubleshoot is to use a validating viewer/editor to examine the XSL FO. Many (such as oXygen) will show you errors in XSL FO structure as you open them and they will describe the issue (just as the error reported).

    In your case, you obviously have an fo:table-body as a child of fo:block. It cannot be. An fo:table-body have but one valid parent, fo:table. You are either missing the fo:table tag or you have erroneously inserted an fo:block in this position.

    In my opinion, I might do things slightly different. I would put the XHTML content inline into the XSL FO right where you want it. Then I would create an identity transform that copies over all the content that is fo-based, but converts the XHTML parts using XSL. This way, you can actually step that transform in an XSL editor like oXygen and see where errors occur and exactly why. Like any other degugger.

    Note: You may wish to look at other XSLs also, especially if your HTML may have any style="" CSS attributes. If this is the case it is not simple HTML, then you will need a better method for processing the HTML with CSS to FO.

    http://www.cloudformatter.com/css2pdf is based on this complete transform. That general stylesheet is available here: http://xep.cloudformatter.com/doc/XSL/xeponline-fo-translate-2.xsl

    I am the author of that stylesheet. It does much more than you ask, but has a fairly complex parsing recursion for converting CSS styling into XSL FO attributes.