Search code examples

XSLT breaking up XML text based on the value of an attribute of a containing element doesn't work in TEI

I have a nice XSLT code to break up the text of 'p' (paragraph) elements in an XML file into 'w' (word) elements, based on spaces in the string. However, I only want this to affect 'p' elements with the value 'arn' for the attribute @xml:lang. (I also want the new 'w' elements to inherit the '@xml:lang='arn'' attribute and value, but that's secondary). I've modified the code by adding 'p[@xml:lang='arn']/text()' to my match template. This works fine for a normal XML file, but as soon as I try to convert a TEI file, the file comes back unchanged.

Here is my input:

<?xml version="1.0" encoding="UTF-8"?>

    <p xml:lang="arn">Fei meu nùkei neməl təfa</p>
    <p xml:lang="spa">Entonces toma la palabra él</p>
    <p xml:lang="arn">Fei meu nùkei neməl təfa</p>
    <p xml:lang="spa">Entonces toma la palabra él</p>


And my XSLT:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl=""
    <xsl:template match="@*|node()" priority="-1">
            <xsl:apply-templates select="@*|node()"/>
    <xsl:template match="p[@xml:lang='arn']/text()[normalize-space()]">
        <xsl:variable name='orig' select="."/>
        <xsl:variable name='lang' select="$orig/ancestor::*[normalize-space(@xml:lang)][1]/@xml:lang"/>
        <xsl:analyze-string select="." regex="[\p{{L}}\p{{N}}]+">
                <xsl:element name="w">
                    <xsl:attribute name="xml:lang"><xsl:value-of select="$lang"/></xsl:attribute>
                    <xsl:value-of select="."/>
                <xsl:value-of select="."/>

This gives me the desired output:

<?xml version="1.0" encoding="UTF-8"?><text>
    <p xml:lang="arn"><w xml:lang="arn">Fei</w> <w xml:lang="arn">meu</w> <w xml:lang="arn">nùkei</w> <w xml:lang="arn">neməl</w> <w xml:lang="arn">təfa</w></p>
    <p xml:lang="spa">Entonces toma la palabra él</p>
    <p xml:lang="arn"><w xml:lang="arn">Fei</w> <w xml:lang="arn">meu</w> <w xml:lang="arn">nùkei</w> <w xml:lang="arn">neməl</w> <w xml:lang="arn">təfa</w></p>
    <p xml:lang="spa">Entonces toma la palabra él</p>


However, when the input has a TEI header, as follows, I get the input file back.

<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="">
    <p xml:lang="arn">Fei meu nùkei neməl təfa</p>
    <p xml:lang="spa">Entonces toma la palabra él</p>
    <p xml:lang="arn">Fei meu nùkei neməl təfa</p>
    <p xml:lang="spa">Entonces toma la palabra él</p>


Any suggestions to avoid this?


  • In the second version your whole XML is in the default namespace "". So all children of this namespace defined on the root element are in this same namespace.

    One simple solution is to add the line


    to your xsl:stylesheet element of the XSLT.