Search code examples
xmlxslttei

XSLT breaking up XML text based on the value of an attribute of a containing element doesn't work in TEI


I have a nice XSLT code to break up the text of 'p' (paragraph) elements in an XML file into 'w' (word) elements, based on spaces in the string. However, I only want this to affect 'p' elements with the value 'arn' for the attribute @xml:lang. (I also want the new 'w' elements to inherit the '@xml:lang='arn'' attribute and value, but that's secondary). I've modified the code by adding 'p[@xml:lang='arn']/text()' to my match template. This works fine for a normal XML file, but as soon as I try to convert a TEI file, the file comes back unchanged.

Here is my input:

<?xml version="1.0" encoding="UTF-8"?>

<text>
    <body>
<div>
    <p xml:lang="arn">Fei meu nùkei neməl təfa</p>
    <p xml:lang="spa">Entonces toma la palabra él</p>
    <p xml:lang="arn">Fei meu nùkei neməl təfa</p>
    <p xml:lang="spa">Entonces toma la palabra él</p>
</div>

</body></text>

And my XSLT:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="2.0">
    
    <xsl:template match="@*|node()" priority="-1">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    
    <xsl:template match="p[@xml:lang='arn']/text()[normalize-space()]">
        <xsl:variable name='orig' select="."/>
        <xsl:variable name='lang' select="$orig/ancestor::*[normalize-space(@xml:lang)][1]/@xml:lang"/>
        
        <xsl:analyze-string select="." regex="[\p{{L}}\p{{N}}]+">
            <xsl:matching-substring>
                   
                <xsl:element name="w">
                    <xsl:attribute name="xml:lang"><xsl:value-of select="$lang"/></xsl:attribute>
                    <xsl:value-of select="."/>
                </xsl:element>
                
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <xsl:value-of select="."/>
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:template>
    
</xsl:stylesheet>

This gives me the desired output:

<?xml version="1.0" encoding="UTF-8"?><text>
    <body>
<div>
    <p xml:lang="arn"><w xml:lang="arn">Fei</w> <w xml:lang="arn">meu</w> <w xml:lang="arn">nùkei</w> <w xml:lang="arn">neməl</w> <w xml:lang="arn">təfa</w></p>
    <p xml:lang="spa">Entonces toma la palabra él</p>
    <p xml:lang="arn"><w xml:lang="arn">Fei</w> <w xml:lang="arn">meu</w> <w xml:lang="arn">nùkei</w> <w xml:lang="arn">neməl</w> <w xml:lang="arn">təfa</w></p>
    <p xml:lang="spa">Entonces toma la palabra él</p>
</div>

</body></text>

However, when the input has a TEI header, as follows, I get the input file back.

<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
    <teiHeader>
        <fileDesc>
            <titleStmt>
                <title></title>
            </titleStmt>
            <publicationStmt><ab></ab></publicationStmt>
            <sourceDesc><ab></ab></sourceDesc>
        </fileDesc>
    </teiHeader>
    <text>
    <body>
<div>
    <p xml:lang="arn">Fei meu nùkei neməl təfa</p>
    <p xml:lang="spa">Entonces toma la palabra él</p>
    <p xml:lang="arn">Fei meu nùkei neməl təfa</p>
    <p xml:lang="spa">Entonces toma la palabra él</p>
</div>

</body></text>
</TEI>

Any suggestions to avoid this?


Solution

  • In the second version your whole XML is in the default namespace "http://www.tei-c.org/ns/1.0". So all children of this namespace defined on the root element are in this same namespace.

    One simple solution is to add the line

    xpath-default-namespace="http://www.tei-c.org/ns/1.0"
    

    to your xsl:stylesheet element of the XSLT.