I have an XML document that consists of a top-level topic, followed by an optional subtopic, followed by a table. I want to reorganize the whole thing into a table, in which the topic and subtopic are columns
<topic>
<title>Some Category</title>
<topic>
<title>Some Subcategory</title>
<table>
<tr><td>Value 1</td><td>Value 2</td><td>Value 3</td></tr>
...
</table>
</topic>
...
</topic>
...
<topic>
<title>Some Category</title>
<table>
<tr><td>Value 1</td><td>Value 2</td><td>Value 3</td></tr>
...
</table>
</topic>
<table>
<tr><td>Some Category</td><td>Some Subcategory</td><td>Value 1</td><td>Value 2</td><td>Value 3</td></tr>
...
</table>
<table>
<tr><td>Some Category</td><td>Value 1</td><td>Value 2</td><td>Value 3</td></tr>
...
</table>
I've just started learning about XMLStarlet, which seems like it might be the right tool for this job, but I haven't figured out how to deal with that optional subtopic layer.
Answering my own question.
I figured out how to write a bash script that would do this, but an XSLT transform seems more robust and faster. I haven't extensively tested this, but this seems to work. I am a novice at XSLT, so take this with several grains of salt.
<topic>
<title>Some Category</title>
<topic>
<title>Some Subcategory</title>
<table>
<tr><td>Value 1</td><td>Value 2</td><td>Value 3</td></tr>
...
</table>
</topic>
...
</topic>
<topic>
<title>Some Category</title>
<table>
<tr><td>Value 1</td><td>Value 2</td><td>Value 3</td></tr>
...
</table>
</topic>
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<head>
<title>Test doc</title>
</head>
<body>
<xsl:if test="topic/topic">
<xsl:for-each select="topic">
<xsl:variable name="topic_title" select="title/text()" />
<xsl:for-each select="topic">
<xsl:variable name="subtopic_title" select="title/text()" /> <xsl:copy-of select="$topic_title" /> <xsl:copy-of select="$subtopic_title" />
<xsl:for-each select="//tr">
<tr><td><xsl:copy-of select="$topic_title" /></td><td><xsl:copy-of select="$subtopic_title" /></td><xsl:copy-of select="*" /> </tr>
</xsl:for-each>
</table>
</xsl:for-each>
</xsl:for-each>
</xsl:if>
<xsl:if test="not(topic/topic)">
<xsl:for-each select="topic">
<xsl:variable name="topic_title" select="title/text()" />
<xsl:copy-of select="$topic_title" />
<xsl:for-each select="//tr">
<tr><td><xsl:copy-of select="$topic_title" /></td><xsl:copy-of select="*" /></tr>
</xsl:for-each>
</table>
</xsl:for-each>
</xsl:if>
</body></html>
</xsl:template>
</xsl:stylesheet>