Search code examples
xmlxsltxsl-foapache-fop

XSL-T & XSL-FO: restructure XML data to dynamically create page-sequences for every page


I'm not really sure how to describe my problem in English, so I'm hoping my example will make it clear what I am trying to do.

Let's say I have the following XML data:

<ROOT>
  <A>
    <ID>A1</ID>
    <DATA>
      <ENTRY>
        <ENTRYID>Entry1</ENTRYID>
        <ITEM1>Item1</ITEM1>
        <ITEM2>Item2</ITEM2>
        <ITEM3>Item3</ITEM3>
      </ENTRY>
      <ENTRY>
        <ENTRYID>Entry2</ENTRYID>
        <ITEM1>Item2_1</ITEM1>
        <ITEM2>Item2_1</ITEM2>
        <ITEM3>Item2_3</ITEM3>
      </ENTRY>
      ... even more entries...
    </DATA>
  </A>
  <A>
    <ID>A2</ID>
    <DATA>
      <ENTRY>
        <ENTRYID>Entry1</ENTRYID>
        <ITEM1>foo</ITEM1>
        <ITEM2>bar</ITEM2>
        <ITEM3>andsoon</ITEM3>
      </ENTRY>
      <ENTRY>
        <ENTRYID>Entry2</ENTRYID>
        <ITEM1>even</ITEM1>
        <ITEM2>more</ITEM2>
        <ITEM3>items</ITEM3>
      </ENTRY>
      ... even more entries...
    </DATA>
  </A>
  <A>
    .. as many A-Elements as you can think of...
  </A>
</ROOT>

There are no limits as to how many A-Elements can be in my XML data or how many ENTRY-Elements can be inside an A-Element.

So I have an existing XSL-File that puts all the data inside one big page sequence (XSL-FO). I'm using Apache FOP to process the XML and XSL. The output format is PDF. Now I'm experiencing memory issues when the XML data is very big. I've read a lot about tuning the performance and memory consumption when dealing with big data and am trying to split my data into one page sequence per page. The problem I'm facing is that I don't know how to split or restructure the data before processing them in my stylesheet.

Now my stylesheet matches the nodes for A and ENTRY and formats the data into some neatly designed tables:

<xsl:template match="A">
  ... print fancy title for table with A/ID ...
  <fo:table>
    <fo:table-header>
       ... fancy table header here ...
    </table-header>
    <fo:table-body>
      <xsl:apply-templates select="DATA/ENTRY"/>
      <fo:table-row>
        ... do some calculating for each A and create a sum table row ...
      </fo:table-row>
    </fo:table-body>
  </fo:table> 
</xsl:template>

<xsl:template match="ENTRY">
  <fo:table-row>
     ... print Entry data in table cells ...
  </fo:table-row>
</xsl:template>

The complete table for one A element can stretch over many hundreds of pages (worst case). I know how many Entry-Elements will fit into one page. Because of the table header and the sum table row the first and last page of one A element will fit less ENTRY elements as the pages in between. I need to split the data into appropriate chunks. As I have no influence on the structure of the XML file I need to do this directly in the stylesheet.

I tried some things with xsl:key because they work fine when grouping data, but I don't know if those even work for my 'special' form of grouping and if yes, how this will work.

So my resulting XSL should look like this:

<xsl:template match="/">
  <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
    <fo:layout-master-set>...</fo:layout-master-set>

      <fo:page-sequence master-reference="{$master}">

        <fo:flow flow-name="xsl-region-body" font-size="10pt">

          <xsl:apply-templates select="A"/>
          <xsl:apply-templates select="ENTRY elemnts for first page"/>

        </fo:flow>

      </fo:page-sequence>

      <fo:page-sequence master-reference="{$master}">

        <fo:flow flow-name="xsl-region-body" font-size="10pt">

          <xsl:apply-templates select="ENTRY elemnts for pages in between"/>

        </fo:flow>

      </fo:page-sequence>

      <fo:page-sequence master-reference="{$master}">

        <fo:flow flow-name="xsl-region-body" font-size="10pt">

          <xsl:apply-templates select="ENTRY elemnts for last page"/>

        </fo:flow>

      </fo:page-sequence>

  </fo:root>
</xsl:template>

Please note that the middle page-sequence has to be in a loop and of course there can be more than one A element. I'm unsure how to loop appropriatly over all the data for all page-sequences.


Solution

    1. You may be able to get by with starting a new fo:page-sequence for each A. Each page sequence would 'just' be up to several hundred pages instead of up to multiples of several hundred pages for the whole document.
    2. You can use recursion to select and process the next n ENTRY elements.
    3. Using XSLT 2.0 for would likely produce clearer, more concise code, but since you're talking about using xsl:key to do grouping, it looks like you're using XSLT 1.0
    4. Don't think about it in terms of looping, think about it in terms of selecting and processing what's in the source. After all, there are no looping variables to update in XSLT 1.0 or XSLT 2.0.

    With one fo:page-sequence per A, your template for A becomes:

    <xsl:template match="A">
      <fo:page-sequence master-reference="{$master}">
        <fo:flow flow-name="xsl-region-body" font-size="10pt">
           ... print fancy title for table with A/ID ...
           <fo:table>
             <fo:table-header>
               ... fancy table header here ...
             </table-header>
             <fo:table-body>
               <xsl:apply-templates select="DATA/ENTRY"/>
               <fo:table-row>
                 ... do some calculating for each A and create a sum table row ...
               </fo:table-row>
             </fo:table-body>
           </fo:table>
         </fo:flow>
       </fo:page-sequence>
    </xsl:template>
    

    The recursive solution presumably requires handling the single-page case as well as the multi-page case:

    <xsl:param name="single-page-count" select="1" />
    <xsl:param name="first-page-count" select="2" />
    <xsl:param name="middle-page-count" select="3" />
    <xsl:param name="last-page-count" select="2" />
    
    <xsl:template match="ROOT">
      <fo:root>
        <fo:layout-master-set>
          <fo:simple-page-master master-name="a">
            <fo:region-body/>
          </fo:simple-page-master>
        </fo:layout-master-set>
        <xsl:apply-templates select="A" />
      </fo:root>
    </xsl:template>
    
    <xsl:template match="A">
      <xsl:variable name="count"
                    select="count(DATA/ENTRY)" />
    
      <xsl:variable name="title">
        <xsl:call-template name="title" />
      </xsl:variable>
    
      <xsl:variable name="sum-row">
        <xsl:call-template name="sum-row" />
      </xsl:variable>
    
      <xsl:choose>
        <xsl:when test="$count &lt;= $single-page-count">
          <xsl:call-template name="page">
            <xsl:with-param name="title" select="$title" />
            <xsl:with-param name="rows">
              <xsl:apply-templates select="DATA/ENTRY"/>
            </xsl:with-param>
            <xsl:with-param name="sum-row" select="$sum-row" />
          </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
          <xsl:call-template name="page">
            <xsl:with-param name="title" select="$title" />
            <xsl:with-param name="rows">
              <xsl:apply-templates
                  select="DATA/ENTRY[position() &lt;= $first-page-count]"/>
            </xsl:with-param>
          </xsl:call-template>
          <xsl:call-template name="other-pages">
            <xsl:with-param name="entries"
                            select="DATA/ENTRY[position() > $first-page-count]" />
            <xsl:with-param name="sum-row" select="$sum-row" />
          </xsl:call-template>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:template>
    
    <xsl:template name="other-pages">
      <xsl:param name="entries" />
      <xsl:param name="sum-row" />
      <xsl:variable name="count"
                    select="count($entries)" />
    
      <xsl:choose>
        <xsl:when test="$count &lt;= $last-page-count">
          <xsl:call-template name="page">
            <xsl:with-param name="rows">
              <xsl:apply-templates select="$entries"/>
            </xsl:with-param>
            <xsl:with-param name="sum-row" select="$sum-row" />
          </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
          <xsl:call-template name="page">
            <xsl:with-param name="rows">
              <xsl:apply-templates
                  select="$entries[position() &lt;= $middle-page-count]"/>
            </xsl:with-param>
          </xsl:call-template>
          <xsl:call-template name="other-pages">
            <xsl:with-param name="entries"
                            select="$entries[position() > $middle-page-count]" />
            <xsl:with-param name="sum-row" select="$sum-row" />
          </xsl:call-template>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:template>
    
    <xsl:template name="page">
      <xsl:param name="title" />
      <xsl:param name="rows" />
      <xsl:param name="sum-row" />
    
      <fo:page-sequence master-reference="a">
        <fo:flow flow-name="xsl-region-body">
          <xsl:copy-of select="$title" />
          <fo:table>
            <fo:table-header>
              ... fancy table header here ...
            </fo:table-header>
            <fo:table-body>
              <xsl:copy-of select="$rows" />
              <xsl:copy-of select="$sum-row" />
            </fo:table-body>
          </fo:table>
        </fo:flow>
      </fo:page-sequence>
    </xsl:template>
    
    <xsl:template name="title">
      ... print fancy title for table <xsl:value-of select="ID"/> ...
    </xsl:template>
    
    <xsl:template name="sum-row">
      <fo:table-row>
        ... do some calculating for each A and create a sum table row ...
      </fo:table-row>
    </xsl:template>