Search code examples
xmlxsltlibreoffice-calcexport-to-xml

Handling repeating rows when exporting spreadsheet to XML using XSLT


I have a very large spreadsheet that I'm trying to export to XML. Any row with repeating cells are skipped over during the transformation. So if a row has 1 2 2 2 3 4 5 only the first 2 will be copied to the XML file. I've found out this is because of how the spreadsheet stores repeating rows. I need my XSLT file to handle repeating cells in a row instead of skipping them.

I'm using LibreOffice Calc to do the transformations.

Here's the part of my sample .xslt file that does the heavy lifting:

<xsl:template match="office:spreadsheet/table:table">

   <xsl:for-each select="table:table-row[position() &gt; 0]">

   <xsl:text>&#xa;</xsl:text>
   <xsl:text>&#xa;</xsl:text>

   <Column>

     <xsl:text>&#xa;</xsl:text>

     <p1><xsl:value-of select="table:table-cell[1]/text:p" /></p1> 
     <p1><xsl:value-of select="table:table-cell[2]/text:p" /></p1> 
     <p2><xsl:value-of select="table:table-cell[3]/text:p" /></p2> 
     <p3><xsl:value-of select="table:table-cell[4]/text:p" /></p3> 
     <p4><xsl:value-of select="table:table-cell[5]/text:p" /></p4> 
     <p5><xsl:value-of select="table:table-cell[6]/text:p" /></p5> 
     <p6><xsl:value-of select="table:table-cell[7]/text:p" /></p6> 

      <xsl:text>&#xa;</xsl:text>

      </Column>
   </xsl:for-each>
</xsl:template>

Edit: Here's what my sample spreadsheet looks like in LibreOffice:

1   2   3   4   5
6   7   8   9   10
12  12  13  14  15
15  15  15  15  15
17  17  18  18  17

And the resulting XML file:

<?xml version="1.0"?>
<root>

<Column>
<p1>1</p1><p1>2</p1><p2>3</p2><p3>4</p3><p4>5</p4><p5/><p6/>
</Column>

<Column>
<p1>6</p1><p1>7</p1><p2>8</p2><p3>9</p3><p4>10</p4><p5/><p6/>
</Column>

<Column>
<p1>12</p1><p1>13</p1><p2>14</p2><p3>15</p3><p4/><p5/><p6/>
</Column>

<Column>
<p1>15</p1><p1/><p2/><p3/><p4/><p5/><p6/>
</Column>

 <Column>
 <p1>17</p1><p1>18</p1><p2>17</p2><p3/><p4/><p5/><p6/>
 </Column></root>

Solution

  • In Open Document *.ods files cells which are identically are not stored redundant. Instead the first cell will have a attribute table:number-columns-repeated which tells the count of repeating: <table:table-cell table:number-columns-repeated="2" ...>.

    So XSLT which transforms this to XMLwhich needs all elements must generating elements repeatedly. Since we need using XSLT 1.0, this is not as simple as it should. We need a template which calls itself recursively.

    Also your numbered element names p1, p2, ... does not making things easier. To achieve this and having the numbering according the resulting output elements instead of the input elements we must collecting the output elements in a variable first and then numbering that collection.

    And your elements names are confusing. What you are calling Column are rows and the p elements are cells. So I have renamed them accordingly.

    So having the following sheet:

    enter image description here

    and using export XSLT of:

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:exsl="http://exslt.org/common" 
     xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" 
     xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0" 
     xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0" exclude-result-prefixes="office table text exsl">
    
     <xsl:template match="/">
      <xsl:element name="sheet">
       <xsl:apply-templates select="/*/office:body" />
      </xsl:element>
     </xsl:template>
    
     <xsl:template match="office:body">
      <xsl:apply-templates />
     </xsl:template>
    
     <xsl:template match="office:spreadsheet">
      <xsl:apply-templates />
     </xsl:template>
    
     <xsl:template match="office:spreadsheet/table:table">
    
      <xsl:for-each select="table:table-row">
       <xsl:element name="row">
    
       <xsl:variable name="thecells">
    
        <xsl:for-each select="table:table-cell">
    
         <xsl:variable name="repeated">
          <xsl:choose>
           <xsl:when test="@table:number-columns-repeated">
            <xsl:value-of select="@table:number-columns-repeated" />
           </xsl:when>
           <xsl:otherwise>
            <xsl:text>1</xsl:text>
           </xsl:otherwise>
          </xsl:choose>
         </xsl:variable>
    
         <xsl:call-template name="repeatecells">
          <xsl:with-param name="start" select="1" />
          <xsl:with-param name="end" select="$repeated"/>
         </xsl:call-template>
    
        </xsl:for-each>
    
       </xsl:variable>
    
       <xsl:for-each select="exsl:node-set($thecells)/cell">
        <xsl:element name="{concat('cell', position())}"><xsl:value-of select="current()" /></xsl:element>
       </xsl:for-each>
    
       </xsl:element>
      </xsl:for-each>
    
     </xsl:template>
    
     <xsl:template name="repeatecells">
      <xsl:param name="start"/>
      <xsl:param name="end"/>
    
      <xsl:if test="not($start > $end)">
       <xsl:choose>
        <xsl:when test="$start = $end">
         <xsl:element name="cell"><xsl:value-of select="text:p" /></xsl:element>
        </xsl:when>
        <xsl:otherwise>
         <xsl:variable name="mid" select= "floor(($start + $end) div 2)"/>
         <xsl:call-template name="repeatecells">
          <xsl:with-param name="start" select="$start"/>
          <xsl:with-param name="end" select="$mid"/>
         </xsl:call-template>
         <xsl:call-template name="repeatecells">
          <xsl:with-param name="start" select="$mid+1"/>
          <xsl:with-param name="end" select="$end"/>
         </xsl:call-template>
        </xsl:otherwise>
       </xsl:choose>
      </xsl:if>
     </xsl:template>
    
    </xsl:stylesheet>
    

    results in:

    <?xml version="1.0"?>
    <sheet>
    <row><cell1>P1</cell1><cell2>P2</cell2><cell3>P3</cell3><cell4>P4</cell4><cell5>P5</cell5><cell6>P6</cell6></row>
    <row><cell1>1</cell1><cell2>1</cell2><cell3>1</cell3><cell4>1</cell4><cell5>2</cell5><cell6>2</cell6></row>
    <row><cell1>1</cell1><cell2>2</cell2><cell3>2</cell3><cell4>2</cell4><cell5>2</cell5><cell6>3</cell6></row>
    <row><cell1>1</cell1><cell2>1</cell2><cell3>1</cell3><cell4>2</cell4><cell5>2</cell5><cell6>2</cell6></row>
    <row><cell1/><cell2>1</cell2><cell3>1</cell3><cell4/><cell5>2</cell5><cell6>2</cell6></row>
    <row><cell1>1</cell1><cell2/><cell3/><cell4>2</cell4><cell5/><cell6/></row></sheet>