Search code examples
xmlcsvxsltxslt-2.0xslt-grouping

how to transform a csv file to a structured XML file using XSLT 2.0?


I wanted to transform below CSV to XML

Example CSV Input

01,TeacherHeader1
02,StudentHeader1
03,SubjectHeader1
10,Grade1,Score99
10,Grade2,Score99
48,SubjectTrailer1
49,StudentTrailer1
02,StudentHeader2
03,SubjectHeader1
10,Grade1,Score50
10,Grade2,Score50
48,SubjectTrailer1
49,StudentTrailer2
50,TeacherTrailer1

Output should be

  <FileHeader> 
    <id>01</id>  
    <name>TeacherHeader1</name> 
  </FileHeader>  
  <GroupRecord> 
    <GroupHeader> 
      <id>02</id>  
      <name>StudentHeader1</name> 
    </GroupHeader>  
    <AccountRecord> 
      <AccountHeader> 
        <id>03</id>  
        <name>SubjectHeader1</name> 
      </AccountHeader>  
      <AccountDetails> 
        <Details> 
          <id>10</id>  
          <name>Grade1</name>  
          <value>Score99</value> 
        </Details>  
        <Details> 
          <id>10</id>  
          <name>Grade2</name>  
          <value>Score99</value> 
        </Details> 
      </AccountDetails>  
      <AccountTrailer> 
        <id>48</id>  
        <name>SubjectTrailer1</name> 
      </AccountTrailer> 
    </AccountRecord>  
    <GroupTrailer> 
      <id>49</id>  
      <name>StudentTrailer1</name> 
    </GroupTrailer> 
  </GroupRecord>  
  <GroupRecord> 
    <GroupHeader> 
      <id>02</id>  
      <name>StudentHeader2</name> 
    </GroupHeader>  
    <AccountRecord> 
      <AccountHeader> 
        <id>03</id>  
        <name>SubjectHeader1</name> 
      </AccountHeader>  
      <AccountDetails> 
        <Details> 
          <id>10</id>  
          <name>Grade1</name>  
          <value>Score99</value> 
        </Details>  
        <Details> 
          <id>10</id>  
          <name>Grade2</name>  
          <value>Score99</value> 
        </Details> 
      </AccountDetails>  
      <AccountTrailer> 
        <id>48</id>  
        <name>SubjectTrailer1</name> 
      </AccountTrailer> 
    </AccountRecord>  
    <GroupTrailer> 
      <id>49</id>  
      <name>StudentTrailer2</name> 
    </GroupTrailer> 
  </GroupRecord>  
  <FileTrailer> 
    <id>50</id>  
    <name>TeacherTrailer1</name> 
  </FileTrailer> 

where

01 = FileHeader 
02 = GroupHeader (grouped inside GroupRecord)
03 = AccountHeader (grouped inside AccountRecord)
10 = Details (grouped inside AccountDetails)
48 = AccountTrailer (grouped inside AccountRecord)
49 = GroupTrailer (group inside GroupRecord)
50 = FileTrailer  

I wanted to transform the CSV above into a properly structured XML as seen above. Any help would be greatly appreciated. Thanks.


Solution

  • As I said in a comment, you can process the text file with unparsed-text and tokenize to convert it to XML (or use unparsed-text-lines and tokenize in XSLT 3 if available), then the rest of the tasks can be done with nested xsl:for-each-groups, perhaps even with one or two recursive functions once there has been a regular pattern established; the following tries to spell out the nested for-each-groups:

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:xs="http://www.w3.org/2001/XMLSchema"
        expand-text="yes"
        xmlns:mf="http://example.com/mf"
        exclude-result-prefixes="#all"
        version="3.0">
    
      <xsl:param name="data" as="xs:string">01,TeacherHeader1
    02,StudentHeader1
    03,SubjectHeader1
    10,Grade1,Score99
    10,Grade2,Score99
    48,SubjectTrailer1
    49,StudentTrailer1
    02,StudentHeader2
    03,SubjectHeader1
    10,Grade1,Score50
    10,Grade2,Score50
    48,SubjectTrailer1
    49,StudentTrailer2
    50,TeacherTrailer1</xsl:param>
    
    <xsl:param name="header-ids" as="xs:string*"
      select="'01', '02', '03', '10', '48', '49', '50'"/>
    
    <xsl:param name="header-names" as="xs:string*"
      select="'FileHeader ', 'GroupHeader', 'AccountHeader', 'Details', 'AccountTrailer', 'GroupTrailer', 'FileTrailer'"/>
    
      <xsl:variable name="lines">
          <xsl:for-each select="tokenize($data, '\r?\n')">
              <line>
                  <xsl:variable name="tokens" as="xs:string*" select="tokenize(., ',')"/>
                  <id>{$tokens[1]}</id>
                  <name>{$tokens[2]}</name>
                  <xsl:if test="$tokens[3]">
                      <value>{$tokens[3]}</value>
                  </xsl:if>
              </line>
          </xsl:for-each>
      </xsl:variable>
    
      <xsl:mode on-no-match="shallow-copy"/>
    
      <xsl:output method="xml" indent="yes"/>
    
      <xsl:template match="/" name="xsl:initial-template">
          <xsl:for-each-group select="$lines/line" group-starting-with="line[id = '01']">
              <File>
                  <xsl:apply-templates select="."/>
                  <xsl:for-each-group select="current-group() except ." group-ending-with="line[id = '50']">
                      <xsl:for-each-group select="current-group()[position() lt last()]" group-starting-with="line[id = '02']">
                          <GroupRecord>
                              <xsl:apply-templates select="."/>
                              <xsl:for-each-group select="current-group() except ." group-ending-with="line[id = '49']">
                                  <xsl:for-each-group select="current-group()[position() lt last()]" group-starting-with="line[id = '03']">
                                      <AccountRecord>
                                          <xsl:apply-templates select="."/>
                                          <AccountDetails>
                                              <xsl:apply-templates select="(current-group() except .)[id != '48']"/>
                                          </AccountDetails>
                                          <xsl:apply-templates select="current-group()[id = '48']"/>
                                      </AccountRecord>
                                  </xsl:for-each-group>
                                  <xsl:apply-templates select="current-group()[last()]"/>
                              </xsl:for-each-group>
                          </GroupRecord>
                      </xsl:for-each-group>
                      <xsl:apply-templates select="current-group()[last()]"/>
                  </xsl:for-each-group>
              </File>
          </xsl:for-each-group>
      </xsl:template>
    
      <xsl:template match="line">
          <xsl:element name="{$header-names[index-of($header-ids, current()/id)]}">
              <xsl:apply-templates/>
          </xsl:element>
      </xsl:template>
    
    </xsl:stylesheet>
    

    https://xsltfiddle.liberty-development.net/gWEaSv8. The sample data has been inlined for completeness and compactness of the example but you could of course use <xsl:param name="data" as="xs:string" select="unparsed-text('file.txt')"/> instead. I also used the xsl:mode declaration and the name="xsl:initial-template", both XSLT 3 features which you would need to adapt for an XSLT 2 processor to spell out the identity transformation and to use a different template name like e.g. name="main" as the entry point for the code. I also used text values template like <id>{$tokens[1]}</id> there, for an XSLT 2 processor you would need to use e.g. <id><xsl:value-of select="$tokens[1]"/</id>.