So in this grotty extruded typesetting product, I sometimes see links and email addresses that have been split apart. Example:
<p>Here is some random text with an email address
<Link>example</Link><Link>@example.com</Link> and here
is more random text with a url
<Link>http://www.</Link><Link>example.com</Link> near the end of the sentence.</p>
Desired output:
<p>Here is some random text with an email address
<email>[email protected]</email> and here is more random text
with a url <ext-link ext-link-type="uri" xlink:href="http://www.example.com/">
http://www.example.com/</ext-link> near the end of the sentence.</p>
Whitespace between the elements does not appear to occur, which is one blessing.
I can tell I need to use an xsl:for-each-group within the p template, but I can't quite see how to put the combined text from the group through the contains() function so as to distinguish emails from URLs. Help?
If you use group-adjacent then you can simply string-join the current-group() as in
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xsd"
version="2.0">
<xsl:template match="p">
<xsl:copy>
<xsl:for-each-group select="node()" group-adjacent="boolean(self::Link)">
<xsl:choose>
<xsl:when test="current-grouping-key()">
<xsl:variable name="link-text" as="xsd:string" select="string-join(current-group(), '')"/>
<xsl:choose>
<xsl:when test="matches($link-text, '^https?://')">
<ext-link ext-link-type="uri" xlink:href="{$link-text}">
<xsl:value-of select="$link-text"/>
</ext-link>
</xsl:when>
<xsl:otherwise>
<email><xsl:value-of select="$link-text"/></email>
</xsl:otherwise>
</xsl:choose>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>