Search code examples
xsltxslt-1.0

XSLT mapping to correct XML format in Input file


I am checking on the possibility of correcting the input XML file syntax with XSLT 1.0 mapping, but not able to find any relevant blogs, is this even possible ?

Input is a combination multiple files, below are the sample input and outputs, Please suggest.

Input

<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
    <CHILD BEGIN="1">
        <RECORD SEGMENT="1">
            <FIELD1>value1</FIELD1>
            <FIELD2>2</FIELD2>      
        </RECORD>       
    </CHILD>
</HEADER>
<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
    <CHILD BEGIN="1">
        <RECORD SEGMENT="1">
            <FIELD1>value2</FIELD1>
            <FIELD2>3</FIELD2>

        </RECORD>   
    </CHILD>
</HEADER>

output

<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
    <CHILD BEGIN="1">
        <RECORD SEGMENT="1">
            <FIELD1>value1</FIELD1>
            <FIELD2>2</FIELD2>      
        </RECORD>       
    </CHILD>

    <CHILD BEGIN="1">
        <RECORD SEGMENT="1">
            <FIELD1>value2</FIELD1>
            <FIELD2>3</FIELD2>

        </RECORD>   
    </CHILD>
</HEADER>


Solution

  • In XSLT 2/3, as pointed in a comment to the other answer, you could try to process the input as plain text (by using unparsed-text on the input file or by passing its content as a string value/string item to the transformation):

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      version="3.0"
      xmlns:xs="http://www.w3.org/2001/XMLSchema"
      exclude-result-prefixes="#all"
      expand-text="yes">
    
      <xsl:output method="xml" indent="yes"/>
    
      <xsl:template match=".[. instance of xs:string]" name="xsl:initial-template">
        
        <xsl:variable name="elements" as="element()*">
          <xsl:for-each-group select="tokenize(., '\r?\n')" group-starting-with=".[starts-with(., '&lt;?xml version')]">
            <xsl:sequence select="(current-group() => tail() => string-join('&#10;') => parse-xml-fragment()) ! *"/>
          </xsl:for-each-group>      
        </xsl:variable>
    
        <xsl:for-each-group select="$elements" group-by="node-name()">
          <xsl:copy>
            <xsl:copy-of select="@*, current-group()!*"/>
          </xsl:copy>
        </xsl:for-each-group>
        
      </xsl:template>
    
    </xsl:stylesheet>
    

    Online example passing input as string value/string item.