Search code examples
xmlxmlstarlet

Munging XML with variable nesting


I have an XML document that consists of a top-level topic, followed by an optional subtopic, followed by a table. I want to reorganize the whole thing into a table, in which the topic and subtopic are columns

Source 1

<topic>
    <title>Some Category</title>
    <topic>
        <title>Some Subcategory</title>
        <table>
            <tr><td>Value 1</td><td>Value 2</td><td>Value 3</td></tr>
            ...
        </table>
    </topic>
    ...
</topic>
...

Source 2

<topic>
    <title>Some Category</title>
    <table>
        <tr><td>Value 1</td><td>Value 2</td><td>Value 3</td></tr>
        ...
    </table>
</topic>

Target 1

<table>
    <tr><td>Some Category</td><td>Some Subcategory</td><td>Value 1</td><td>Value 2</td><td>Value 3</td></tr>
    ...
</table>

Target 2

<table>
    <tr><td>Some Category</td><td>Value 1</td><td>Value 2</td><td>Value 3</td></tr>
    ...
</table>

I've just started learning about XMLStarlet, which seems like it might be the right tool for this job, but I haven't figured out how to deal with that optional subtopic layer.


Solution

  • Answering my own question.

    I figured out how to write a bash script that would do this, but an XSLT transform seems more robust and faster. I haven't extensively tested this, but this seems to work. I am a novice at XSLT, so take this with several grains of salt.

    Source

    <topic>
        <title>Some Category</title>
        <topic>
            <title>Some Subcategory</title>
            <table>
                <tr><td>Value 1</td><td>Value 2</td><td>Value 3</td></tr>
                ...
            </table>
        </topic>
        ...
    </topic>
    <topic>
        <title>Some Category</title>
        <table>
            <tr><td>Value 1</td><td>Value 2</td><td>Value 3</td></tr>
            ...
        </table>
    </topic>
    

    XSLT

    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 
    <xsl:template match="/">
    <html>
    <head>
    <title>Test doc</title>
    </head>
    <body>
    <xsl:if test="topic/topic">
            <xsl:for-each select="topic">
                <xsl:variable name="topic_title" select="title/text()" /> 
                <xsl:for-each select="topic">
                    <xsl:variable name="subtopic_title" select="title/text()" /> <xsl:copy-of select="$topic_title" /> <xsl:copy-of select="$subtopic_title" /> 
                        <xsl:for-each select="//tr">
                            <tr><td><xsl:copy-of select="$topic_title" /></td><td><xsl:copy-of select="$subtopic_title" /></td><xsl:copy-of select="*" /> </tr>
                        </xsl:for-each>
                    </table>
                </xsl:for-each>
            </xsl:for-each>
        </xsl:if>
        <xsl:if test="not(topic/topic)">
            <xsl:for-each select="topic">
            <xsl:variable name="topic_title" select="title/text()" /> 
                <xsl:copy-of select="$topic_title" /> 
                    <xsl:for-each select="//tr">
                        <tr><td><xsl:copy-of select="$topic_title" /></td><xsl:copy-of select="*" /></tr>
                    </xsl:for-each>
                </table>
            </xsl:for-each>
        </xsl:if>
    </body></html>
    </xsl:template>
    </xsl:stylesheet>