Search code examples
xmlinixsltxmlstarlet

Transform INI to XML? OR any generic legacy flat-file? XSL? from xmlstarlet or xsltproc?


I'm looking do some sort of transform from INI to XML, the INI syntax is simple. I'm not looking to sed/awk/grep, this really should be done in XML tools.

Can this be done with regular XSL? I have heard of Xflat, but can I do that from tools compiled in C? Such as xsltproc or xmlstarlet.

Generic INI syntax is like this...

[section]
option = values

which would be in xml like this...

<section>
<option>values</option>
</section>

Any help would be very appreciated.


Solution

  • Can this be done with regular XSL?

    Yes, and XSLT 2.0 provides more facilities than XSLT 1.0 for processing text. Very complex text processing has been implemented in XSLT, including a general LR(1) parser, used for building parsers for specific grammars, such as JSON and XPath.

    In particular, learn about unparsed-text(), the various string functions, including the ones that allow using regular expressions (matches(), tokenize() and replace()) and also the <xsl:analyze-string> instruction.

    XSLT 1.0 also has string functions (as provided by XPath 1.0), however it lacks the regular expressions capabilty/functions and there is nothing such as the XSLT 2.0 function unparsed-text(). Among the most useful XPath 1.0 string functions are: substring(), substring-before(), substring-after(), starts-with(), string-length(), concat(), and especially the translate() function.

    One can "read" a file by using an entity in a DTD, as Mads Hansen has explained in his answer. Another way is to read the file in the program that initiates the transformation, then to pass the file's content as a string parameter to the transformation.

    Update: The OP has now provided specific data, so that a complete solution is possible:

    <xsl:stylesheet version="2.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:xs="http://www.w3.org/2001/XMLSchema">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>
    
     <xsl:variable name="vText" select=
     "unparsed-text('file:///c:/temp/delete/test.ini')"/>
    
     <xsl:variable name="vLines" as="xs:string*" select=
       "tokenize($vText, '&#xD;?&#xA;')[.]"/>
    
     <xsl:variable name="vLineCnt" select="count($vLines)"/>
    
     <xsl:variable name="vSectLinesInds" as="xs:integer*" select=
      "for $i in 1 to $vLineCnt
         return
           if(starts-with(normalize-space($vLines[$i]), '['))
             then $i
             else ()
      "/>
    
     <xsl:variable name="vSectCnt" select="count($vSectLinesInds)"/>
    
     <xsl:template match="/">
      <xsl:for-each select="$vSectLinesInds">
        <xsl:variable name="vPos" select="position()"/>
        <xsl:variable name="vInd" as="xs:integer" select="."/>
    
         <xsl:variable name="vthisLine" as="xs:string"
              select="$vLines[$vInd]"/>
    
        <xsl:variable name="vNextSectInd" select=
         "if($vPos eq $vSectCnt)
            then
              $vLineCnt +1
            else
              $vSectLinesInds[$vPos +1]
         "/>
    
       <xsl:variable name="vInnerLines" select=
       "$vLines
           [position() gt current()
          and
            position() lt $vNextSectInd
           ]
    
       "/>
    
       <xsl:variable name="vName" select=
        "tokenize($vthisLine, '\[|\]')[2]"/>
    
       <xsl:element name="{$vName}">
        <xsl:for-each select="$vInnerLines">
          <xsl:variable name="vInnerParts" select=
          "tokenize(., '[ ]*=[ ]*')"/>
    
          <xsl:element name="{$vInnerParts[1]}">
            <xsl:value-of select="$vInnerParts[2]"/>
          </xsl:element>
        </xsl:for-each>
      </xsl:element>
      </xsl:for-each>
     </xsl:template>
    </xsl:stylesheet>
    

    when this transformation is applied on any XML document (not used) and if the file at C:\temp\delete\test.ini has the following content:

    [section1]
    option1 = values1
    option2 = values2
    option3 = values3
    option4 = values4
    option5 = values5
    
    [section2]
    option1 = values1
    option2 = values2
    option3 = values3
    option4 = values4
    option5 = values5
    
    [section3]
    option1 = values1
    option2 = values2
    option3 = values3
    option4 = values4
    option5 = values5
    

    the wanted, correct result is produced:

    <section1>
       <option1>values1</option1>
       <option2>values2</option2>
       <option3>values3</option3>
       <option4>values4</option4>
       <option5>values5</option5>
    </section1>
    <section2>
       <option1>values1</option1>
       <option2>values2</option2>
       <option3>values3</option3>
       <option4>values4</option4>
       <option5>values5</option5>
    </section2>
    <section3>
       <option1>values1</option1>
       <option2>values2</option2>
       <option3>values3</option3>
       <option4>values4</option4>
       <option5>values5</option5>
    </section3>