Search code examples
xmlcsvxslt-1.0transformxslt-grouping

Converting CSV to hierarchichal XML using XSLT


I need to create an XSLT to convert a CSV (comma separated file) into hierarchical XML.

This is the input file:

<root>
L11,L12,L21,L22,L31,L32
1,A,1,C,1,G
1,A,1,C,2,H
1,A,2,D,1,I
1,A,2,D,2,J
2,B,1,E,1,K
2,B,1,E,2,L
2,B,2,F,1,M
2,B,2,F,2,N
</root>

This is desired output XML:

<?xml version="1.0" encoding="utf-8"?>
<Document>
  <Level1>
    <L11>1</L11>
    <L12>A</L12>
    <Level2>
      <L21>1</L11>
      <L22>C</L12>
      <Level3>
        <L31>1</L31>
        <L32>G</L32>
      </Level3>
      <Level3>
        <L31>2</L31>
        <L32>H</L32>
      </Level3>
    </Level2>
    <Level2>
      <L21>2</L11>
      <L22>D</L12>
      <Level3>
        <L31>1</L31>
        <L32>I</L32>
      </Level3>
      <Level3>
        <L31>2</L31>
        <L32>J</L32>
      </Level3>
    </Level2>
  </Level1>
  <Level1>
    <L11>2</L11>
    <L12>B</L12>
    <Level2>
      <L21>1</L11>
      <L22>E</L12>
      <Level3>
        <L31>1</L31>
        <L32>K</L32>
      </Level3>
      <Level3>
        <L31>2</L31>
        <L32>L</L32>
      </Level3>
    </Level2>
    <Level2>
      <L21>2</L11>
      <L22>F</L12>
      <Level3>
        <L31>1</L31>
        <L32>M</L32>
      </Level3>
      <Level3>
        <L31>2</L31>
        <L32>N</L32>
      </Level3>
    </Level2>
  </Level1>
</Document>

I've been trying to find some example online, however couldn't find anything similar. I've never done XSLT transformations before so I'd appreciate if you could point me in the right direction.

Update 1: I am thinking of a 2-step transformation. E.g. first step is to transform CSV to XML:

<?xml version="1.0" encoding="utf-8"?>
<Document>
  <row><L11>1</L11><L12>A</L12><L21>1</L12><L31>C</L31><L32>1</L31><L32>G</L32></row>
  <row><L11>1</L11><L12>A</L12><L21>1</L12><L31>C</L31><L32>2</L31><L32>H</L32></row>
  <row><L11>1</L11><L12>A</L12><L21>2</L12><L31>D</L31><L32>1</L31><L32>I</L32></row>
  <row><L11>1</L11><L12>A</L12><L21>2</L12><L31>D</L31><L32>2</L31><L32>J</L32></row>
  <row><L11>2</L11><L12>B</L12><L21>1</L12><L31>E</L31><L32>1</L31><L32>K</L32></row>
  <row><L11>2</L11><L12>B</L12><L21>1</L12><L31>E</L31><L32>2</L31><L32>L</L32></row>
  <row><L11>2</L11><L12>B</L12><L21>2</L12><L31>F</L31><L32>1</L31><L32>M</L32></row>
  <row><L11>2</L11><L12>B</L12><L21>2</L12><L31>F</L31><L32>2</L31><L32>N</L32></row>   
</Document>

And a second step is to transform that XML into the desired format using some sort of grouping. I don't mind having 2 transformations if there's no other way to achieve that.

Any advice please?

Update 2: Microsoft .NET Framework XSLT processor will be used.

If the abstract example is hard to read you can see a real-life example of the required transformation here: http://servingxml.sourceforge.net/examples/#timesheets-eg

As I understand, using a single transformation is impossible, so if someone could show me how to transform an XML from the Update 1 format to the desired XML format, half of the job would done and I will accept that answer.


Solution

  • For clarity, I have modified your input slightly, so that the labels make some sense:

    XML

    <root>
    GroupName,GroupValue,SubGroupName,SubGroupValue,ItemName,ItemValue
    1,A,1,C,1,G
    1,A,1,C,2,H
    1,A,2,D,1,I
    1,A,2,D,2,J
    2,B,1,E,1,K
    2,B,1,E,2,L
    2,B,2,F,1,M
    2,B,2,F,2,N
    </root>
    

    XSLT 1.0

    <xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:exsl="http://exslt.org/common"
    extension-element-prefixes="exsl">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    
    <xsl:key name="k1" match="row" use="cell[1]"/>
    <xsl:key name="k2" match="row" use="concat(cell[1], '|', cell[3])"/>
    
    <xsl:template match="/">
        <!-- tokenize csv -->
        <xsl:variable name="rows">
            <xsl:call-template name="tokenize">
                <xsl:with-param name="text" select="root"/>
            </xsl:call-template>
        </xsl:variable>
        <xsl:variable name="data">
            <xsl:for-each select="exsl:node-set($rows)/row[position() > 1]">
                <row>
                    <xsl:call-template name="tokenize">
                        <xsl:with-param name="text" select="."/>
                        <xsl:with-param name="delimiter" select="','"/>
                        <xsl:with-param name="name" select="'cell'"/>
                    </xsl:call-template>
                </row>
            </xsl:for-each>
        </xsl:variable>
        <!-- output -->
        <document>
            <xsl:for-each select="exsl:node-set($data)/row[count(. | key('k1', cell[1])[1]) = 1]">
                <group>
                    <name>
                        <xsl:value-of select="cell[1]"/>
                    </name>
                    <value>
                        <xsl:value-of select="cell[2]"/>
                    </value>
                    <xsl:for-each select="key('k1', cell[1])[count(. | key('k2', concat(cell[1], '|', cell[3]))[1]) = 1]">
                        <subgroup>
                            <name>
                                <xsl:value-of select="cell[3]"/>
                            </name>
                            <value>
                                <xsl:value-of select="cell[4]"/>
                            </value>
                            <items>
                                <xsl:for-each select="key('k2', concat(cell[1], '|', cell[3]))">
                                    <item>
                                        <name>
                                            <xsl:value-of select="cell[5]"/>
                                        </name>
                                        <value>
                                            <xsl:value-of select="cell[6]"/>
                                        </value>
                                    </item>
                                </xsl:for-each>
                            </items>
                        </subgroup>
                    </xsl:for-each>
                </group>
            </xsl:for-each>
        </document>
    </xsl:template>
    
    <xsl:template name="tokenize">
        <xsl:param name="text"/>
        <xsl:param name="delimiter" select="'&#10;'"/>
        <xsl:param name="name" select="'row'"/>
        <xsl:variable name="token" select="substring-before(concat($text, $delimiter), $delimiter)" />
        <xsl:if test="$token">
            <xsl:element name="{$name}">
                <xsl:value-of select="$token"/>
            </xsl:element>
        </xsl:if>
        <xsl:if test="contains($text, $delimiter)">
            <!-- recursive call -->
            <xsl:call-template name="tokenize">
                <xsl:with-param name="text" select="substring-after($text, $delimiter)"/>
                <xsl:with-param name="delimiter" select="$delimiter"/>
                <xsl:with-param name="name" select="$name"/>
            </xsl:call-template>
        </xsl:if>
    </xsl:template>
    
    </xsl:stylesheet>
    

    Result

    <?xml version="1.0" encoding="UTF-8"?>
    <document>
       <group>
          <name>1</name>
          <value>A</value>
          <subgroup>
             <name>1</name>
             <value>C</value>
             <items>
                <item>
                   <name>1</name>
                   <value>G</value>
                </item>
                <item>
                   <name>2</name>
                   <value>H</value>
                </item>
             </items>
          </subgroup>
          <subgroup>
             <name>2</name>
             <value>D</value>
             <items>
                <item>
                   <name>1</name>
                   <value>I</value>
                </item>
                <item>
                   <name>2</name>
                   <value>J</value>
                </item>
             </items>
          </subgroup>
       </group>
       <group>
          <name>2</name>
          <value>B</value>
          <subgroup>
             <name>1</name>
             <value>E</value>
             <items>
                <item>
                   <name>1</name>
                   <value>K</value>
                </item>
                <item>
                   <name>2</name>
                   <value>L</value>
                </item>
             </items>
          </subgroup>
          <subgroup>
             <name>2</name>
             <value>F</value>
             <items>
                <item>
                   <name>1</name>
                   <value>M</value>
                </item>
                <item>
                   <name>2</name>
                   <value>N</value>
                </item>
             </items>
          </subgroup>
       </group>
    </document>
    

    Note:

    1. The element names are hard-coded into the stylesheet and not taken from the input (although that too would be possible with more effort);

    2. You may have to use the msxsl:node-set() function instead of the EXSLT one.