I am trying to XSL-Transform an unzipped Microsoft DOCX document in the form of an XML document. With the following stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="w:body">
<body>
<xsl:for-each select="w:p">
<p>
<xsl:for-each select="w:r">
<xsl:value-of select="w:t"/>
</xsl:for-each>
</p>
</xsl:for-each>
</body>
</xsl:template>
</xsl:stylesheet>
I am able to obtain the following XML fragment:
<?xml version="1.0" encoding="UTF-8"?>
<body xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<p></p>
<p></p>
<p>Mastering</p>
<p>Front-End Web Development</p>
<p>14 Books in 1</p>
<p>Introducing 200+ ExtensionsAn Advanced Guide</p>
<p></p>
<p></p>
<p></p>
......
Note that I have concatenated the broken text parts (<w:t> in <w:r>s) and removed unwanted tags in the original XML document.
Now how can I pass it to the next stage of the XSLT pipeline so that I can remove all the empty <p> elements?
Since 2017 we have XSLT 3 and it has xsl:where-populated
it seems you want to use as in
<xsl:where-populated>
<p>
<xsl:for-each select="w:r">
<xsl:value-of select="w:t"/>
</xsl:for-each>
</p>
</xsl:where-populated>
so perhaps if you use Saxon 9.8 or later or Saxon JS 2 or AltovaXML 2017 R3 or later it is quite easy.