I have some XML like this:
<TEI>
<text>
<div type="scene" n="1">
<sp xml:id="sp1">
<speaker>Julius</speaker>
<l>Lorem ipsum dolor sit amet</l>
<ptr cRef="..." />
<stage>Aside</stage>
<ptr cRef="..." />
<l>consectetur adipisicing elit</l>
<stage>To Antony</stage>
<l>sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</l>
</sp>
<sp xml:id="sp2">
...
And I need to lift all the <stage>
elements up one level to become siblings of the <sp>
s, breaking the <sp>
s up so that the <stage>
elements retain their preceding and following relations with the other elements inside the <sp>
, e.g.
<TEI>
<text>
<div type="scene" n="1">
<sp by="#Julius">
<l>Lorem ipsum dolor sit amet</l>
<ptr cRef="..." />
</sp>
<stage>Aside</stage>
<sp by="#Julius">
<ptr cRef="..." />
<l>consectetur adipisicing elit</l>
</sp>
<stage>To Antony</stage>
<sp by="#Julius">
<l>sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</l>
</sp>
I've been working on an XSLT to do this. It includes a recursive template which is intended to consume all the child elements of an <sp>
up to (but not including) the first <stage>
child and emit them in the result tree as children of a new <sp>
. Then emit the first <stage>
element. And then recurse on all the elements following that first <stage>
element. Eventually, when the list of child elements has no <stage>
s left, all the remaining elements are emitted in the result tree inside a new <sp>
. Here's the code, including debugging <xsl:message>
s:
<xsl:template name="sp-with-stage">
<!-- call with speaker -->
<xsl:param name="speaker" />
<!-- call with an <sp> element -->
<xsl:param name="sp" />
<!-- $content parameter is optional, by default it's the children of the given $sp; this is the parameter whose value is different with each recursive call -->
<xsl:param name="content" select="$sp/*" />
<!-- find the first <stage> element amongst the $content node set -->
<xsl:variable name="stage" select="$content/following-sibling::stage[1]" />
<xsl:message>ID = <xsl:value-of select="$sp/@xml:id" /></xsl:message>
<xsl:message>speaker = "<xsl:value-of select="$speaker" />"</xsl:message>
<xsl:message>content length = <xsl:value-of select="count($content)" /></xsl:message>
<xsl:if test="$stage">
<xsl:message>nodes before $stage = <xsl:value-of select="count($stage/preceding-sibling::*)" /></xsl:message>
<xsl:message>nodes after $stage = <xsl:value-of select="count($stage/following-sibling::*)" /></xsl:message>
</xsl:if>
<xsl:if test="$stage">
<sp by="#{$speaker}">
<!-- process all the nodes in the $content node set before the current <stage> -->
<xsl:message>Processing <xsl:value-of select="count($stage/preceding-sibling::*)" /> nodes before "<xsl:value-of select="$stage/text()" />"</xsl:message>
<xsl:apply-templates select="$stage/preceding-sibling::*" />
</sp>
<xsl:apply-templates select="$stage" />
</xsl:if>
<xsl:choose>
<xsl:when test="$stage/following-sibling::stage">
<!-- if there's another <stage> element in the $content node set then call this template recursively -->
<xsl:message>Call recursively with <xsl:value-of select="count($stage/following-sibling::*)" /> following nodes</xsl:message>
<xsl:call-template name="sp-with-stage">
<xsl:with-param name="speaker"><xsl:value-of select="$speaker" /></xsl:with-param>
<xsl:with-param name="sp" select="$sp" />
<!-- the $content node set for this call is all the nodes after the current <stage> -->
<xsl:with-param name="content" select="$stage/following-sibling::*" />
</xsl:call-template>
</xsl:when>
<xsl:when test="$stage/following-sibling::*">
<!-- if there's no <stage> element in the $content node set, but there are still some elements, emit them in an <sp> element -->
<sp by="#{$speaker}">
<xsl:message>Processing <xsl:value-of select="count($stage/following-sibling::*)" /> trailing nodes</xsl:message>
<xsl:apply-templates select="$stage/following-sibling::*" />
</sp>
</xsl:when>
</xsl:choose>
</xsl:template>
This template is then called like this:
<xsl:template match="sp[stage]">
<xsl:call-template name="sp-with-stage">
<xsl:param name="speaker"><xsl:value-of select="speaker" /></xsl:param>
<xsl:param name="sp" select="." />
</xsl:call-template>
</xsl:template>
The problem is with my use of $stage/preceding-sibling::*
by which I mean to process just the nodes from the current $content
node set that precede the current $stage
node. What actually happens is that, in every recursive call, all of the nodes which preceded the current $stage
node from its original <sp>
context are selected by this $stage/preceding-sibling::*
. This is despite the fact that the recursive calls get the correct new $content
node set each time and that the $stage
node is being taken from that correct $content
node set.
To clarify, in the case of the above example XML, when the <stage>To Antony</stage>
is the $stage
node and the $content
node contains just:
<l>consectetur adipisicing elit</l>
<stage>To Antony</stage>
<l>sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</l>
the $stage/preceding-sibling::*
expression still yields all the children of the original <sp>
up to <stage>To Antony</stage>
.
I guess there must be something about preceding-sibling
that I'm not properly understanding. Any suggestions? Or even any suggestions of completely different ways to achieve the transformation?
This is a grouping problem - you want to group together all the elements inside each sp
(except speaker
and stage
) by their closest preceding stage
(if there is one). The standard approach to this in XSLT 1.0 is called Muenchian grouping. You define a key giving the grouping criteria and then use a generate-id
trick to process the first node in each group as a proxy for the group as a whole.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:strip-space elements="*"/>
<xsl:output indent="yes" />
<!-- group first by the parent sp and then by the nearest preceding stage.
generate-id(emptynodeset) is the empty string by definition, so this
is still well defined for the elements before the first stage in an sp -->
<xsl:key name="groupKey" match="sp/*[not(self::speaker | self::stage)]" use="
concat(generate-id(..), '|', generate-id(preceding-sibling::stage[1]))" />
<!-- identity template - copy everything as-is unless overridden -->
<xsl:template match="@*|node()">
<xsl:copy><xsl:apply-templates select="@*|node()" /></xsl:copy>
</xsl:template>
<xsl:template match="sp">
<!-- for each group -->
<xsl:for-each select="*[generate-id() = generate-id(key('groupKey',
concat(generate-id(..), '|', generate-id(preceding-sibling::stage[1]))
)[1])]">
<!-- the "stage" if there is one - if we are before the first stage in this
sp then the preceding-sibling:: will select nothing -->
<xsl:apply-templates select="preceding-sibling::stage[1]" />
<sp by="#{../speaker}">
<!-- the following elements up to the next stage -->
<xsl:apply-templates select="key('groupKey',
concat(generate-id(..), '|', generate-id(preceding-sibling::stage[1]))
)" />
</sp>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
This works on your example input but may need some alterations if there are any instances where you have two consecutive stage
elements with nothing else in between them.