Search code examples
xmlescapingnodesxslt-1.0

XSLT 1. :: Processing encoded string to get child notes


It seems I am trying to achieve the unachievable but here it is : I have a XML structure like :

<?xml version="1.0" encoding="UTF-8"?>
<article>
       <h1>&lt;p mystyle=&quot;texte&quot;&gt;&lt;b&gt;Barème&lt;/b&gt; sur 20 points&lt;/p&gt;</h1>
</article>

If I call value-of and disable-output-escaping="yes", I get a nicely formatted structure :

<?xml version="1.0" encoding="UTF-8"?>
<article>
<h1>
    <p mystyle="texte"><b>Barème</b> sur 20 points</p>
</h1>
</article>

Problem is I would like to only get childrend of the p tag :

<?xml version="1.0" encoding="UTF-8"?>
<article>
<h1>
    <b>Barème</b> sur 20 points
</h1>
</article>

Any of my attemps to reach childs only fail.

  1. Setting unescaped string as variable and process the variable

<?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="1.0" >
    <xsl:template match="/">
      <xsl:apply-templates select="*"></xsl:apply-templates>
    </xsl:template>
    <xsl:template match="h1">
        <xsl:variable name="unescaped">
            <xsl:value-of select="." disable-output-escaping="yes"/>    
        </xsl:variable>
        <xsl:value-of select="$unescaped/*"/>
    </xsl:template>
    </xsl:stylesheet>

  1. trying to set xpath like ./* (empty result)

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="1.0" >
    
    <xsl:template match="/">
      <xsl:apply-templates select="*"></xsl:apply-templates>
    </xsl:template>
    
    <xsl:template match="h1">
        <xsl:value-of select="./*" disable-output-escaping="yes"/>
    </xsl:template>
    
</xsl:stylesheet>

I have been reading that one of the limitations of disable-output-escaping is that "nodes" cannot be processed. Is that so ?

Thanks in advance for any hint.


Solution

  • If you are limited to XSLT 1.0 you must do the transformation in two passes: first, disable output escaping on h1 and save the result to a file; then process the resulting file as "normal" XML, using another stylesheet.

    Alternatively, you could try to parse out information from the escaped string using text functions and output the result with escaping disabled - something like:

    <xsl:template match="h1">
        <xsl:copy>
            <xsl:value-of select="substring-before(substring-after(., '>'), '&lt;/p>')" disable-output-escaping="yes"/>
        </xsl:copy>
    </xsl:template>
    

    but this may easily fail if your input does not follow the same structure as your example. And of course the result, being a string, can only be output as is; no further processing is possible within the same stylesheet, except by text functions.