I am trying to retrieve the TOC from docx's document.xml file using XSLT
Here is my XSLT:
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:sap="http://www.sap.com/sapxsl" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" exclude-result-prefixes="w" version="2.0">
<xsl:output indent="yes" method="xml"/>
<xsl:strip-space elements="*"/>
<xsl:template match="w:sdt">
<xsl:element name="root">
<xsl:attribute name="label">
<xsl:value-of select="w:sdtPr/w:docPartObj/w:docPartGallery/@w:val"/>
</xsl:attribute>
<xsl:for-each select="w:sdtContent/w:p">
<xsl:if test="w:pPr/w:pStyle/@w:val">
<xsl:element name="sec">
<xsl:attribute name="label">
<xsl:value-of select="w:pPr/w:pStyle/@w:val"/>
</xsl:attribute>
<xsl:attribute name="anchor">
<xsl:value-of select="w:hyperlink/@w:anchor"/>
</xsl:attribute>
<xsl:attribute name="title">
<xsl:value-of select="w:hyperlink/w:r/w:t"/>
</xsl:attribute>
</xsl:element>
</xsl:if>
</xsl:for-each>
</xsl:element>
</xsl:if>
</xsl:template>
</xsl:transform>
i am getting the desired result but with additional w:p tag values outside of w:sdtContent scope.
I am a beginner in XSLT and not sure what i am doing wrong here.
(if the source xml would help, please let me know i will post it here.)
XSLT processes its input, starting from the root node, with a set of default rules. These default rules can be overridden - but you don't do that. I suspect the unwanted extra output you see comes from the default rules.
Your stylesheet contains a template <xsl:template match="w:sdt">
and the XSLT processor does run that template, but only when it gets to a <w:sdt>
while it traverses the input document.
If you want to start at the root node yourself and dictate what nodes the XSLT processor should look at, override the default behavior by writing a template that matches the root node (<xsl:template match="/">
).
<xsl:transform
version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:sap="http://www.sap.com/sapxsl"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
exclude-result-prefixes="w"
>
<xsl:output indent="yes" method="xml" />
<xsl:strip-space elements="*" />
<xsl:template match="/">
<xsl:apply-temmplates select="//w:sdt" />
</xsl:template>
<xsl:template match="w:sdt">
<root label="{w:sdtPr/w:docPartObj/w:docPartGallery/@w:val}" />
<xsl:apply-templates select="w:sdtContent/w:p[w:pPr/w:pStyle/@w:val]" />
</root>
</xsl:template>
<xsl:template match="w:sdtContent/w:p">
<sec
label="{w:pPr/w:pStyle/@w:val}"
anchor="{w:hyperlink/@w:anchor}"
title="{w:hyperlink/w:r/w:t}"
/>
</xsl:template>
</xsl:transform>
Other notes:
<xsl:element name="foo">
. Write <foo>
.<xsl:attribute name="bar">
Write <foo bar="{xpath-expr}">
.<xsl:for-each>
. Use <xsl:apply-templates>
and <xsl:template>
.<xsl:if>
for filtering which nodes you want to process. Write an appropriate XPath expression that only selects nodes you want to process instead.<xsl:apply-templates>
works.