Search code examples
xmlxslttei

Concatenate text nodes by milestone element using XSLT 2.0


My XML files have TEI milestone elements like <handShift new="#DP1053/>; there's always one at the start of the <text> content, but after that there are sometimes a handful of tags of this kind with one of two or three distinct attribute values scattered through the <text> element to indicate where a particular scribe picks up after the last. The value of attribute @new points to an @xml:id definition in the TEI header, registered as an attribute to a <handNote/> element.

My aim in XSLT 2.0 is to concatenate the text written by each scribe so I can query each scribe's work independently. I wonder whether a recommended solution would entail group-starting-with, but I haven't yet wrapped my head around the preprocessing involved (I'd be grateful for pointers). Instead, my own instinct is to perform

  • a for-each loop iterating over the scribal hands, running
  • a string-join
  • on all text nodes
  • where a preceding <handShift/> with a value of attribute @new matching the hand processed in the current loop iteration is nearer than a preceding <handShift/> where the attribute value does not match.

My trial syntax in an XSLT 2.0 stylesheet transforming to HTML is as follows:

<xsl:for-each select="//tei:handNote[@xml:id != '']">
    <xsl:variable name="hand" select="./@xml:id"/>
    <p><xsl:value-of select="$hand"/>: <xsl:value-of select="string-join(//tei:text//text()[preceding-sibling::tei:handShift[@new = concat('#',$hand)] &gt;&gt; preceding-sibling::tei:handShift[@new != concat('#',$hand)]])"/></p>
</xsl:for-each>

However, this only returns the text node(s) following the final milestone in the text, and only in the for-each iteration that selects for the attribute value matching that final milestone. I've surely got the >> statement wrong and would be grateful for any advice either with this approach or for a different, grouping-based approach.

I should probably mention that once I've mastered this concatenation, I'll have to add any <add hand="DP1054">addition</add>-type content (i.e. revisions by hands not matching that of the current stint) into the equation, by excluding nonmatching content of this nature and including matching content situated within nonmatching scribal stints, but I don't necessarily foresee this having to be added to the concatenation in the "correct" place. I thus should be able to account for these in two fairly straightforward additional steps, but the initial concatenation or grouping solution has to allow for the exclusion of nodes with nonmatching attribute values, and of any other elements I may wish to exclude (e.g. <expan> in the below example).

Here is a mock XML file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE TEI>
<TEI>
 <teiHeader>
  <fileDesc/>
   <sourceDesc>
    <msDesc>
     <physDesc>
      <handDesc>
       <handNote xml:id="DP1054"/>
       <handNote xml:id="DP1053"/>
      </handDesc>
     </physDesc>
    </msDesc>
   </sourceDesc>
 </teiHeader>
 <text>
  <body>
    <p><handShift new="#DP1054"/>I'LL REPRESENT THE WORK OF HAND 1054 IN ALLCAPS <handShift new="#DP1053"/>and I'll represent the work of hand 1053 in lowercase <handShift new="#DP1054"/>THE IDEA BEING THAT IN THE END ALL UPPERCASE TEXT SHOULD BE CONCATENATED <handShift new="#DP1053"/>separately from the sentence case content. Of course reality is a little more <add hand="#DP1054">COMPLEX</add>: we have <hi rend="color(green)">other nodes intervening</hi>, <handShift new="#DP1054"/>AND I WONDER WHETHER THESE WILL MESS WITH THE CONCEPT OF <choice>
     <abbr>SBLS</abbr>
     <expan>S<ex>I</ex>BL<ex>ING</ex>S</expan>
    </choice> <handShift new="#DP1053"/> (I will filter out nodes with `tei:expan` ancestors and nonmatching `add` elements; that's not the part I am having difficulty with).</p>
  </body>
 </text>
</TEI>

Solution

  • I think group-starting-with can help, here is an example that stores the result in an XPath 3.1 map (well, the grouping gives a sequence of maps and the map:merge functions merges them into a single map from id to nodes after a handShift of that id):

    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xmlns:map="http://www.w3.org/2005/xpath-functions/map"
        xmlns:xs="http://www.w3.org/2001/XMLSchema"
        xpath-default-namespace="http://www.tei-c.org/ns/1.0"
        exclude-result-prefixes="#all"
        version="3.0">
    
      <xsl:output method="html" indent="yes" html-version="5"/>
    
      <xsl:variable name="note-map-sequence" as="map(xs:string, node()*)*">
          <xsl:for-each-group select="//body/p/node()" group-starting-with="handShift">
              <xsl:map-entry key="substring(@new, 2)" select="current-group()"/>
          </xsl:for-each-group>
      </xsl:variable>
    
      <xsl:variable name="note-map" as="map(xs:string, node()*)"
        select="map:merge($note-map-sequence, map { 'duplicates' : 'combine' })"/>
    
      <xsl:template match="/">
        <html>
          <head>
            <title>.NET XSLT Fiddle Example</title>
          </head>
          <body>
            <xsl:apply-templates select="//handNote"/>
          </body>
        </html>
      </xsl:template>
    
      <xsl:template match="handNote">
          <p>
            <xsl:value-of select="@xml:id"/>: 
            <xsl:apply-templates select="$note-map(@xml:id)"/>
          </p>
      </xsl:template>
    
    </xsl:stylesheet>
    

    https://xsltfiddle.liberty-development.net/bFWRApk has an online sample outputting

    <!DOCTYPE HTML>
    <html>
       <head>
          <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>.NET XSLT Fiddle Example</title></head>
       <body>
          <p>DP1054: 
             I'LL REPRESENT THE WORK OF HAND 1054 IN ALLCAPS THE IDEA BEING THAT IN THE END ALL UPPERCASE TEXT SHOULD BE CONCATENATED AND I WONDER WHETHER THESE WILL MESS WITH THE CONCEPT OF 
             SBLS
             SIBLINGS
    
          </p>
          <p>DP1053: 
             and I'll represent the work of hand 1053 in lowercase separately from the sentence case content. Of course reality is a little more COMPLEX: we have other nodes intervening,  (I will filter out nodes with `tei:expan` ancestors and nonmatching `add` elements;
             that's not the part I am having difficulty with).
          </p>
       </body>
    </html>
    

    XSLT 3 with XPath 3.1 is available since Saxon 9.8 so most people using Saxon 9 for XSLT 2 should be able to use XSLT 3 as well by using the latest (9.9) or previous (9.8) version of Saxon.

    Of course the map only serves as an elegant and light-weight container for the grouping results, the used for-each-group can be used with XSLT 2 as well, only you would need to store the grouping result in some intermediary XML (e.g. <group id="{current-grouping-key()}">...</group>) instead.