Search code examples
xqueryxqilla

parse escaped HTML into node in xqilla


I'm trying to get text from an rss 2.0 feed (description tag) using XQilla. The address is here. This is fine but the tag contains escaped HTML like

"<a href="some_address>..."

It would be useful to have this HTML in a node and further work with it, but I am at a loss here. I can get the tag contents with

let $desc := $item/*[name()='description']

but do not know how to unescape it. I tried parse-html, which only strips the text of tags and returns a string, like the data() function. Searching on the web suggests that extension functions exist for this, but in other parsers. Is there a way to do it in XQilla? By the way, the code I am working on is a JAWS ResearchIt lookup source.


Solution

  • XQilla has – like lots of other XQuery implementations – a proprietary function to load XML and HTML from a string (they don't have anchor tags, thus you need to scroll through the document, I'm sorry).

    xqilla:parse-xml($xml as xs:string?) as document-node()?
    xqilla:parse-html($html as xs:string?) as document-node()?
    

    Given $desc contains the unparsed HTML, xqilla:parse-html($desc) will return the parse result.