Search code examples
xslt-2.0

unescape text for subsequent xslt 2 processing


I get an input file hosting elements like

<item>
<Description>
    Intro 1
    &lt;b&gt;Title&lt;/b&gt;
    Intro 2
    &lt;ul&gt;
    &lt;li&gt;item 1&lt;/li&gt;
    &lt;li&gt;&lt;b&gt;item 2&lt;/b&gt;&lt;/li&gt;
    &lt;/ul&gt;
    Finish
</Description>
</item>

I would like to create an xslt2 template or function converting this to a node() like

<item>
<Description>
    Intro 1
    <b>Title</b>
    Intro 2
    <ul>
    <li>item 1</li>
    <li><b>item 2</b></li>
    </ul>
    Finish
</Description>
</item>

to process it further.

Any recommendation how to achieve this?


Solution

  • David Carlisle implemented an HTML parser in XSLT 2, you can find it at https://github.com/davidcarlisle/web-xslt/blob/master/htmlparse/htmlparse.xsl and use it as e.g.

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:xs="http://www.w3.org/2001/XMLSchema"
        xmlns:d="data:,dpc"
        exclude-result-prefixes="#all"
        version="3.0">
        
      <xsl:import href="https://github.com/davidcarlisle/web-xslt/raw/master/htmlparse/htmlparse.xsl"/>
    
      <xsl:mode on-no-match="shallow-copy"/>
    
      <xsl:template match="Description">
          <xsl:copy>
              <xsl:apply-templates select="d:htmlparse(., '', true())/node()"/>
          </xsl:copy>
      </xsl:template>
      
    </xsl:stylesheet>
    

    to get a result like

    <item>
    <Description>
        Intro 1
        <b>Title</b>
        Intro 2
        <ul>
        <li>item 1</li>
        <li><b>item 2</b></li>
        Finish
    </ul></Description>
    

    If the input were well-formed XML you could also use XSLT 3/XPath 3's parse-xml-fragment function but without the closing </ul> your sample can't be parsed as XML.