Search code examples
xmlxslttei

Enclose hyphenation in its own element with XSLT


Given the following XML:

<p>
  <lb/>Aber, Schertz bey Seite geſetzet; wer mir und ſo viel ehrlichen
  <lb/>Bieder-Maͤnnern nicht glauben will, der probire es bey den haͤuffigen Kirchen-
  <lb/>Sachen, die ein Sangloser Organiſt etwa geſchmadert hat, (denn es gibt frucht-
  <lb/>bare Naͤchte bey dieſen Leuten,  [...]
</p>

Is there a pure XSLT way of transforming this into this?

<p>
  <lb/>Aber, Schertz bey Seite geſetzet; wer mir und ſo viel ehrlichen
  <lb/>Bieder-Maͤnnern nicht glauben will, der probire es bey den haͤuffigen Kirchen<pc force="strong">-</pc>
  <lb/>Sachen, die ein Sangloser Organiſt etwa geſchmadert hat, (denn es gibt frucht<pc force="weak">-</pc>
  <lb/>bare Naͤchte bey dieſen Leuten,  [...]
</p>

If the first letter following the <lb> element is a capital letter, the force attribute should be strong, otherwise it is weak.

At the moment I am completly stuck on how to select a text node that ends with certain letter (-) with a <lb>-sibling that is itself followed by capital letter ...


Solution

  • Using XSLT 3 (but only for declaring the identity transformation with xsl:mode and using || instead of concat() and analyze-string which could be replaced by xsl:analyze-string) the following sample

    <?xml version="1.0" encoding="utf-8"?>
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      version="3.0"
      xmlns:xs="http://www.w3.org/2001/XMLSchema"
      exclude-result-prefixes="#all"
      expand-text="yes">
      
      <xsl:param name="sep" as="xs:string">-</xsl:param>
      
      <xsl:param name="pattern" as="xs:string" select="'(' || $sep || ')' || '(\s*)$'"/>
    
      <xsl:mode on-no-match="shallow-copy"/>
      
      <xsl:template match="p/text()[matches(., $pattern)][following-sibling::node()[1][self::lb]]">
        <xsl:value-of select="replace(., $pattern, '')"/>
        <pc force="{if (following-sibling::node()[2][self::text()[matches(., '^\p{Lu}')]]) then 'strong' else 'weak'}">{$sep}</pc>
        <xsl:value-of select="analyze-string(., $pattern)//*:group[@nr = 2]"/>
      </xsl:template>
    
    </xsl:stylesheet>
    

    should do. The pattern to match the text node might need to be more specific if a text can be followed by <lb/><foo>...</foo>, e.g. the lb is not necessarily followed by a text node, as in your sample.