Apply XSLT stylesheet to it's own output (Filtering out empty elements)

We're applying an XSL stylesheet to a number of XML files, with different structures, and tags. We want to use a single XSL stylesheet to all of our files, where we can simply add new xpaths if XML files with new content structures are added.

(I might add that this is for use with Solr from Apache, the output document needs to look a certain way.)

So far we've managed to write the code that copies the various fields, like so:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xalan="http://xml.apache.org/xslt" xmlns:exslt="http://exslt.org/common" version="1.0">
<xsl:output method="xml" encoding="UTF-8" indent="yes" xalan:indent-amount="4" omit-xml-declaration="yes"/>
<xsl:template match="/">
    <xsl:param name="fileName" />
    <xsl:param name="fileURI" />
    <xsl:param name="timeCreatedLong" />
<add>
    <doc>
        <!-- REQUIRED FIELDS. DO NOT CHANGE -->
            <field name="fileName"><xsl:value-of select="$fileName" /></field>
            <field name="fileURI"><xsl:value-of select="$fileURI" /></field>
            <field name="timeCreatedLong"><xsl:value-of select="$timeCreatedLong" /></field>
        <!-- //END OF REQUIRED FIELDS -->

        <!-- DSV INTERNAL XML -->
            <!-- Consignment Identifiers -->
            <field name="consignmentIdentifiers"><xsl:value-of select="//consignmentlist/consignment/consignmentId" /></field>
            <field name="consignmentIdentifiers"><xsl:value-of select="//consignmentlist/consignment/references/reference[@type = 'consignment_number']/value" /></field>
            <!-- //Consignment Identifiers -->

            <!-- Transport company information -->
            <field name="carrier"><xsl:value-of select="//transport/transportservice/carriername" /></field>
            <field name="carrierService"><xsl:value-of select="//transport/transportservice/carrierservicename" /></field>
            <field name="transportMode"><xsl:value-of select="//transport/transportservice/transportmode" /></field>
            <!-- //Transport company information -->
        <!-- //DSV INTERNAL XML -->
        

        
        <!-- POSTEN NORDIC LOGISTICS ORDER.XML -->
            <!-- Consignment Identifiers -->
            <field name="consignmentIdentifiers"><xsl:value-of select="//TransportJob/Consignment/@consignmentId" /></field>
            <!-- //Consignment Identifiers -->

            <!-- Transport company information -->
            <field name="definedBy"><xsl:value-of select="//TransportJob/@definedBy" /></field>
            <field name="carrier"><xsl:value-of select="//TransportJob/@profile" /></field>
            <!-- //Transport company information -->
        <!-- //POSTEN NORDIC LOGISTICS ORDER.XML -->
    </doc>
</add>
</xsl:template>

</xsl:stylesheet>

The output, depending on which file structure was processed, looks something like this:

<add>
<doc>
    <field name="fileName">00373323993931432015_BOOKING.INTERNALXML</field>
    <field name="fileURI">/usr/dropbox/Dropbox/shared/file-search/00373323993931432015_BOOKING.INTERNALXML</field>
    <field name="timeCreatedLong">1377507872000</field>
    <field name="consignmentIdentifiers"/>
    <field name="consignmentIdentifiers">00373323993931432015</field>
    <field name="carrier">DSV</field>
    <field name="carrierService">DSV Mypack</field>
    <field name="transportMode">ROAD</field>
    <field name="consignmentIdentifiers"/>
    <field name="definedBy"/>
    <field name="carrier"/>
</doc>
</add>

As you can see, we have some empty / self-closing elements, which we would wish to remove before sending it to our Solr server.

So the real question is, is there a way to remove the generated empty tags, after applying this XSL to it? As stated above, we would like this to be done in the same XSL file.

Solution

One suggestion to improve things to have a couple of generic templates to match elements or attributes, but which take a parameter which can be set to the 'name' of the field you wish to output.

The first template would actually output the field element, setting the name attribute accordingly

<xsl:template match="*|@*">
    <xsl:param name="fieldName" />
    <field name="{$fieldName}">
       <xsl:value-of select="." />
    </field>
</xsl:template>

The other one would be used to ignore such elements or attributes without a value:

<xsl:template match="*[normalize-space()='']|@*[normalize-space()='']" />

(Note that, the more specific template (the one with the Xpath expression checking for an empty string) will get priority here, over the non-specific ones.)

Then, instead of writing this:

<field name="consignmentIdentifiers">
    <xsl:value-of select="//consignmentlist/consignment/consignmentId" />
</field>

You would write this

<xsl:apply-templates select="//consignmentlist/consignment/consignmentId">
    <xsl:with-param name="fieldName" select="'consignmentIdentifiers'" />
</xsl:apply-templates>

And similar for all the other fields you wish to output. Thus, you don't have to worry about writing xsl:if statement around each statement. It is just a slight change to what you are doing at the moment.

EDIT: If you really wanted to apply the XSLT to its own output...

Then the way to do this is using a 'two-pass transform'. Ideally, you would use two XSLTs here, but if you wanted to do one, then one the 'first pass' instead of simply outputting the new elements, you wrap the existing code in a variable

<xsl:variable name="HereBeDragons">
   <add>
      <doc>
          <field ...
      </doc>
   </add>
</xsl:variable>

So, you now have a variable containing your current output, complete with empty tags. Now, if you were using XSLT 2.0, you can just do this to start looking for template matches for elements in the variable

<xsl:apply-templates select="$HereBeDragons/*"/>

But in XSLT 1.0, you will probaly get a message about it not being a node-set. In XSLT 1.0, the variable is actually storing a 'result tree fragment', and needs to be converted to a node-set to allow templates to be used. It looks like you are using EXSLT here, so you should be just able to do this, in this case

<xsl:apply-templates select="exslt:node-set($HereBeDragons)/*" />

Now, having started to apply templates on the variable, you can just add templates to process the data as you want. You would have one template for the indentity template

 <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
   </xsl:copy>
 </xsl:template>

And another, to ignore your empty fields

<xsl:template match="field[normalize-space()='']" />

Be wary though, these templates would apply to both the first-pass and second pass. If you wanted a template that matched a specific element that behaved differently in the second pass, you may need to make use of the mode property on the template to distinguish between them.

Of course, doing a two-pass transform in this way is not that efficient, both in terms of memory or speed, which is why adding logic to the original XSLT to not output empty tags in the first place is being proposed.