Search code examples
xmlxsltxslt-1.0xslt-2.0docx

How to remove empty elements in the next stage of XSLT pipeline


I am trying to XSL-Transform an unzipped Microsoft DOCX document in the form of an XML document. With the following stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:fn="http://www.w3.org/2005/xpath-functions">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:template match="w:body">
        <body>
            <xsl:for-each select="w:p">
                <p>
                    <xsl:for-each select="w:r">
                        <xsl:value-of select="w:t"/>
                    </xsl:for-each>
                </p>
            </xsl:for-each>
        </body>
    </xsl:template>
</xsl:stylesheet>

I am able to obtain the following XML fragment:

<?xml version="1.0" encoding="UTF-8"?>
<body xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:fn="http://www.w3.org/2005/xpath-functions">
<p></p>
<p></p>
<p>Mastering</p>
<p>Front-End Web Development</p>
<p>14 Books in 1</p>
<p>Introducing 200+ ExtensionsAn Advanced Guide</p>
<p></p>
<p></p>
<p></p>
......

Note that I have concatenated the broken text parts (<w:t> in <w:r>s) and removed unwanted tags in the original XML document.

Now how can I pass it to the next stage of the XSLT pipeline so that I can remove all the empty <p> elements?


Solution

  • Since 2017 we have XSLT 3 and it has xsl:where-populated it seems you want to use as in

               <xsl:where-populated>
                <p>
                    <xsl:for-each select="w:r">
                        <xsl:value-of select="w:t"/>
                    </xsl:for-each>
                </p>                   
                </xsl:where-populated>
    

    so perhaps if you use Saxon 9.8 or later or Saxon JS 2 or AltovaXML 2017 R3 or later it is quite easy.