My simplified input looks like this:
<stuff>
<p>CAPITALWORD is part of <i>mixed</i> content.</p>
<p>ANOTHER is <i>here</i> but it's not the only one. SOMEWORDS are <i>mixted up</i> in the same
paragraph. SOMETIMES even <i>multiple times.</i></p>
</stuff>
Now, my goal is to split paragraphs on each full-caps word. I thought I would go for grouping text starting with at least two capital letters like this:
<xsl:output method="xml" indent="true"></xsl:output>
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="p">
<xsl:for-each-group select="node()" group-starting-with="text()[matches(., '[A-Z]{2,}')]">
<xsl:element name="p" >
<xsl:apply-templates select="current-group()"/>
</xsl:element>
</xsl:for-each-group>
</xsl:template>
but this won't work because I'm dealing with mixed content rather than strings only. So I get this:
<stuff>
<p>CAPITALWORD is part of <i>mixed</i> content.</p>
<p>ANOTHER is <i>here</i>
</p>
<p> but it's not the only one. SOMEWORDS are <i>mixed up</i> in the <i>same</i>
</p>
<p>
paragraph. SOMETIMES even <i>multiple times.</i>
</p>
</stuff>
instead of the desired output:
<stuff>
<p>CAPITALWORD is part of <i>mixed</i> content. </p>
<p>ANOTHER is <i>here</i> but it's not the only one. </p>
<p>SOMEWORDS are <i>mixed up</i> in the <i>same</i> paragraph. </p>
<p>SOMETIMES even <i>multiple times.</i></p>
</stuff>
I will be most grateful for tips on how to achieve the desired output.
One approach is a two step transformation, the first step uses analyze-string on text nodes to wrap your capitalized word into an element, the second step then can easily use group-starting-with on those wrapper elements:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
exclude-result-prefixes="#all"
expand-text="yes">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:template match="p">
<xsl:variable name="capitalized-marked-up" as="node()*">
<xsl:apply-templates mode="markup-capitalized"/>
</xsl:variable>
<xsl:for-each-group select="$capitalized-marked-up" group-starting-with="capitalized-word">
<p>
<xsl:apply-templates select="current-group()"/>
</p>
</xsl:for-each-group>
</xsl:template>
<xsl:template match="capitalized-word">
<xsl:apply-templates/>
</xsl:template>
<xsl:mode name="markup-capitalized" on-no-match="shallow-copy"/>
<xsl:template mode="markup-capitalized" match="text()">
<xsl:apply-templates select="analyze-string(., '\p{Lu}{2,}')" mode="wrap"/>
</xsl:template>
<xsl:template mode="wrap" match="fn:match">
<capitalized-word>{.}</capitalized-word>
</xsl:template>
<xsl:output indent="yes"/>
</xsl:stylesheet>