Authors of an xml document did not include all the text inside an element that will be converted to a hyperlink. I would like to process or pre-process the xml to include the necessary text. I find this hard to describe but a simple example should show what I'm attempting. I'm using XSLT 2.0. I already do regular expression processing for various situations but can't figure this out.
I know how to do this with perl/python regular expression but I can't figure out how to approach this with XSLT.
Here is 'very' simplfied xml from an author in which they left out the ' (Sheet 3)' from the glink element.:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root>
<para>
Go look at figure <glink refid=1>Figure 22</glink> (Sheet 3). Then go do something else.
</para>
</root>
Here is what I'd like it to convert to where the ' (Sheet 3)' is now inside the glink tag:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root>
<para>
Go look at figure <glink refid=1>Figure 22 (Sheet 3)</glink>. Then go do something else.
</para>
</root>
The case when this conversion should happen is when there is a glink element followed by (this regular expression):
\s\(Sheet \d\)
I currently have 2 XSLTs. The first pre-processes the XML to convert a number of other situations (using regular expression/xsl:analyze-string). The second XSLT to convert from pre-processed xml to HTML. The second XSLT has a template to handle glink elements and turn it into a hyperlink but the hyperlink should be including the Sheet information.
I would assume that it is easier to pre-process this first and leave the 2nd XSLT alone, but I always appreciate better ways.
Thank you for your time.
In order to reduce the use of regex functions, I would use this approach:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="glink">
<xsl:variable name="vAnalyzedString">
<xsl:analyze-string
select="following-sibling::node()[1][self::text()]"
regex="^\s*\(Sheet\s+\d+\)">
<xsl:matching-substring>
<match>
<xsl:value-of select="."/>
</match>
</xsl:matching-substring>
<xsl:non-matching-substring>
<no-match>
<xsl:value-of select="."/>
</no-match>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:variable>
<xsl:copy>
<xsl:apply-templates select="node()|@*"/>
<xsl:apply-templates
select="$vAnalyzedString/match/text()"/>
</xsl:copy>
<xsl:apply-templates
select="$vAnalyzedString/no-match/text()"/>
</xsl:template>
<xsl:template match="text()[preceding-sibling::node()[1][self::glink]]"/>
</xsl:stylesheet>
Output:
<root>
<para>
Go look at figure <glink refid="1">Figure 22 (Sheet 3)</glink>. Then go do something else.
</para>
</root>
Do note: all glink
are processed but none of those text nodes being the first siblings. It's posible to use xsl:analize-string
instruction, but you will need to declare a variable with partial results and then navegate those results. Also, this approach might easily let you further processing those (now) text nodes and it has only one regex processing.