I searched and came close to finding a solution but that requires Stylesheet 2.0 and I'm stuck on 1.0.
This is the sample XML I have:
<root>
<row>A1: Apples</row>
<row>B1: Red</row>
<row>C1: Reference text</row>
<row>badly formatted text which belongs to row above</row>
<row>and here.</row>
<row>D1: ABC</row>
<row>E1: 123</row>
<row>A1: Oranges</row>
<row>B1: Purple</row>
<row>C1: More References</row>
<row>with no identifier</row>
<row>again and here.</row>
<row>D1: DEF</row>
<row>E1: 456</row>
.
.
I want it to look like:
<root>
<row>
<A1>Apples</A1>
<B1>Red</B1>
<C1>Reference text badly formatted text which belongs to row above and here.</C1>
<D1>ABC</D1>
<E1>123</E1>
</row>
<row>
<A1>Oranges</A1>
<B1>Purple</B1>
<C1>More Reference with no identifier again and here.</C1>
<D1>DEF</D1>
<E1>456</E1>
</row>
.
.
There is a pattern to this which I can convert using other utilities but quite hard with XSL 1.0.
There are headings within the elements that I can use and the reference text field is multi-line when it gets converted to XML, it creates its own row for each line but it's always in the same position between C1 and D1. The actual name of the elements, ie is not important.
The row should break up after E1. I think my example is straightforward but this transformation is not. I consider myself not even a beginner at XML/XSL. I am learning from scratch and then I get shifted to other projects and then have to come back to it again. TIA.
Update: Another case I ran into with slightly different structure but I want the result to be the same:
<root>
<row>
<Field>A1: Apples</Field>
</row>
<row>
<Field>B1: Red</Field>
</row>
<row>
<Field>C1: Reference text</Field>
</row>
<row>
<Field>badly formatted text which belongs to row above</Field>
</row>
<row>
<Field>and here.</Field>
</row>
<row>
<Field>D1: ABC</Field>
</row>
<row>
<Field>E1: 123</Field>
</row>
<row>
<Field>A1: Oranges</Field>
</row>
<row>
<Field>B1: Purple</Field>
</row>
<row>
<Field>C1: More References</Field>
</row>
<row>
<Field>with no identifier</Field>
</row>
<row>
<Field>again and here.</Field>
</row>
<row>
<Field>D1: DEF</Field>
</row>
<row>
<Field>E1: 456</Field>
</row>
I tried applying an identity transform but didn't seem to work:
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match ="row/Field">
<xsl:apply-templates/>
</xsl:template>
This looks kind of tricky, but I have a solution which seems to work. It allows for a variable number of rows after the C1 row (it wasn't clear whether this was always 2 rows or not).
The solution makes heavy use of the following-sibling
axis, which is probably very inefficient, especially for a large input file.
You can test it out here.
<xsl:template match="/root">
<!-- Loop through every "A1" row -->
<xsl:for-each select="row[substring-before(text(), ':') = 'A1']">
<!-- Add a <row> tag -->
<xsl:element name="row">
<!-- Add each of the A1-E1 tags by finding the first following-sibling that matches before the colon -->
<xsl:apply-templates select="." />
<xsl:apply-templates select="following-sibling::*[substring-before(text(), ':') = 'B1'][1]" />
<xsl:apply-templates select="following-sibling::*[substring-before(text(), ':') = 'C1'][1]" />
<xsl:apply-templates select="following-sibling::*[substring-before(text(), ':') = 'D1'][1]" />
<xsl:apply-templates select="following-sibling::*[substring-before(text(), ':') = 'E1'][1]" />
</xsl:element>
</xsl:for-each>
</xsl:template>
<!-- Process each row -->
<xsl:template match="/root/row">
<!-- Create an element whose name is whatever is before the colon in the text -->
<xsl:element name="{substring-before(text(), ':')}">
<!-- Output everything after the colon -->
<xsl:value-of select="normalize-space(substring-after(text(), ':'))" />
<!-- Special treatment for the C1 node -->
<xsl:if test="substring-before(text(), ':') = 'C1'">
<!-- Count how many A1 nodes exist after this node -->
<xsl:variable name="remainingA1nodes" select="count(following-sibling::*[substring-before(text(), ':') = 'A1'])" />
<!-- Loop through all following-siblings that don't have a colon at position 3, and still have the same number of following A1 rows as this one does -->
<xsl:for-each select="following-sibling::*[substring(text(), 3, 1) != ':'][count(following-sibling::*[substring-before(text(), ':') = 'A1']) = $remainingA1nodes]">
<xsl:text> </xsl:text>
<xsl:value-of select="." />
</xsl:for-each>
</xsl:if>
</xsl:element>
</xsl:template>