Search code examples
xmlxsltxsl-foapache-fop

XSL style sheet for XML to XSL-FO


I need to convert XML files into PDF, going to do it through XSL-FO. The source XML file have their structure and dictionary (NITF) and should not be changed. I have to create the specific XSL styler for these files. Out of the whole XML elements there are only a few I need:

text < p > < ul >< li >

tables < tr > < td >

images < media-reference mime-type="application/gif" source="foo.gif" >

So far I've managed to convert the textual part of the XML files. And I can process the files containing just a simple table with fixed column number. When I try to process both text and tables in the source file, I get transformation errors. The (bad working) styler my.xsl is attached as well as the source file. The errors are kind of

org.apache.fop.fo.ValidationException: "fo:table-body" is missing child elements. Required content model: marker* (table-row+|table-cell+)

XML:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE nitf SYSTEM "nitf.dtd">
<nitf>
<head>
    <title type="main">Sub-title 1</title>
    <meta name="filetype" content="content"/>
    <docdata><document-id id-string="123456" /></docdata>
</head>
<body>
    <body.head>
        <hedline><hl1>Sub-title 1</hl1></hedline>
    </body.head>
    <body.content>
        <ul>
            <li>Some long text 1</li><li>Some long text 2</li>
        </ul>
        <table  id="0001.csv">
            <tbody>
                <tr>
                    <td colspan="4" class="tbh">Table tilte 1</td>
                </tr>
                <tr>
                    <td colspan="1" class="tbc">&#160;</td>
                    <td colspan="1" class="tbc-r">Col title 1</td>
                    <td colspan="1" class="tbc-r">Col title 2</td>
                    <td colspan="1" class="tbc-r">Col title 3</td>
                </tr>
                <tr>
                    <td colspan="1" class="tbd">Row title 1</td>
                    <td colspan="1" class="tbd-r">cell text 1</td>
                    <td colspan="1" class="tbd-r">cell text 2</td>
                    <td colspan="1" class="tbd-r">cell text 3</td>
                </tr>
                <tr>
                    <td colspan="1" class="tbd">Row title 2</td>
                    <td colspan="1" class="tbd-r">cell text 4</td>
                    <td colspan="1" class="tbd-r">cell text 5</td>
                    <td colspan="1" class="tbd-r">cell text 6</td>
                </tr>
                <tr>
                    <td colspan="4" class="footnote">Some footnote</td>
                </tr>
                <tr>
                    <td colspan="4" class="source">One more footnote</td>
                </tr>
            </tbody>
        </table>
        <p class="text">Just a short text</p>
        <ul>
            <li>Some long text 3</li><li>Some long text 4</li>
        </ul>
    </body.content>
</body>

XSL:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:fo="http://www.w3.org/1999/XSL/Format" 
                              xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" indent="yes"/>

<xsl:template match="nitf">
    <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">

        <fo:layout-master-set>
            <fo:simple-page-master page-height="297mm" page-width="210mm"
                margin="5mm 25mm 5mm 25mm" master-name="simpleA4">
            <fo:region-body margin="20mm 0mm 20mm 0mm"/>
            </fo:simple-page-master>
        </fo:layout-master-set>
        <!-- NOTE: text part is OK! -->
        <fo:page-sequence master-reference="simpleA4">
            <fo:flow flow-name="xsl-region-body" >
                <fo:block>
                    <xsl:apply-templates select="head"/>
                    <!--xsl:apply-templates select="body"/ If it's uncommented, the table is not seen-->
                </fo:block>
                <fo:block>
                    <fo:table table-layout="fixed" border-style="solid">
                            <xsl:apply-templates select="tr" mode="theader"/>
                            <xsl:apply-templates select="tr" mode="tbody"/> 
                        <fo:table-body>
                            <xsl:apply-templates select="body/table/tbody/tr"/>
                        </fo:table-body>
                    </fo:table>
                </fo:block>
            </fo:flow>            
        </fo:page-sequence>
    </fo:root>
</xsl:template>

  <xsl:template match="tr">
       <fo:table-row>
      <xsl:apply-templates select="td"/>
    </fo:table-row>
  </xsl:template>
  
  <xsl:template match="td">
    <fo:table-cell border-style="solid">
      <fo:block><xsl:value-of select="."/></fo:block>
    </fo:table-cell>
  </xsl:template>

<!-- text -->
<xsl:template match="head">
    <fo:inline font-weight="bold">
        <xsl:apply-templates/>
    </fo:inline>
</xsl:template>

<xsl:template match="body.head">
    <fo:inline font-weight="bold">
        <xsl:apply-templates/>
    </fo:inline>
</xsl:template>

<xsl:template match="body.content">
    <xsl:apply-templates/>
</xsl:template>

<xsl:template match="p">
    <fo:block>
        <xsl:apply-templates/>
    </fo:block>
</xsl:template>
<xsl:template match="b">
    <fo:inline font-weight="bold">
        <xsl:apply-templates/>
    </fo:inline>
</xsl:template>

</xsl:stylesheet >


Solution

  • Google is your friend. I searched for NITF XSL FO and found this https://github.com/ydirson/serna-free/tree/master/serna/dist/plugins/nitf/nitf-xsl-serna

    If you are working with industry standard XML, then XSLs likely exist for HTML and many for XSL FO.

    I cloned and downloaded that project from Github. The XSLs are there and reference some others. You only need the "dist" directory and down, but there are many, many things even in that you do not need. In reality, if you examine the root "nitf.xsl", you would see:

    <xsl:import href="../../../xml/stylesheets/xslbricks/fo/fonts.xsl"/>
    <xsl:import href="../../../xml/stylesheets/xslbricks/fo/common.xsl"/>
    <xsl:import href="../../../xml/stylesheets/xslbricks/fo/layoutsetup.xsl"/>
    <xsl:import href="../../../xml/stylesheets/xslbricks/fo/default-elements.xsl"/>
    <xsl:import href="../../../xml/stylesheets/xslbricks/fo/page-sizes.xsl"/>
    <xsl:import href="../../../xml/stylesheets/xslbricks/fo/xhtml-tables.xsl"/>
    
    <xsl:include href="nitf-param.xsl"/>
    <xsl:include href="nitf-common.xsl"/>
    <xsl:include href="nitf-struct.xsl"/>
    <xsl:include href="nitf-meta.xsl"/>
    <xsl:include href="nitf-blocks.xsl"/>
    <xsl:include href="nitf-inlines.xsl"/>
    <xsl:include href="nitf-lists.xsl"/>
    <xsl:include href="nitf-images.xsl"/>
    <xsl:include href="nitf-tables.xsl"/>
    

    Those imported/included files would represent all the XSLs (unless some of those also reference others, I did not check).

    Running your XML above (after adding the close </nitf> tag you omitted) and formatting the resulting FO to PDF using Apache FOP, it yields this:

    enter image description here

    Now, of course you could examine those XSLs for observations in what you are doing wrong in your XSL if you prefer, but as you can see there is a lot of work put into those XSLs already. I would always try and avoid "reinventing the wheel."

    To reorganize all of that, you could just isolate the XSLs you need, edit the main "nitf.xsl" to reference all of them in one directory if you like. I did that and it still all works (so none of the XSLs I did not examine reference other ones), my directory now has only the below and I have deleted everything else:

    enter image description here