I have a nice XSLT code to break up the text of 'p' (paragraph) elements in an XML file into 'w' (word) elements, based on spaces in the string. However, I only want this to affect 'p' elements with the value 'arn' for the attribute @xml:lang. (I also want the new 'w' elements to inherit the '@xml:lang='arn'' attribute and value, but that's secondary). I've modified the code by adding 'p[@xml:lang='arn']/text()' to my match template. This works fine for a normal XML file, but as soon as I try to convert a TEI file, the file comes back unchanged.
Here is my input:
<?xml version="1.0" encoding="UTF-8"?>
<text>
<body>
<div>
<p xml:lang="arn">Fei meu nùkei neməl təfa</p>
<p xml:lang="spa">Entonces toma la palabra él</p>
<p xml:lang="arn">Fei meu nùkei neməl təfa</p>
<p xml:lang="spa">Entonces toma la palabra él</p>
</div>
</body></text>
And my XSLT:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
exclude-result-prefixes="xs"
version="2.0">
<xsl:template match="@*|node()" priority="-1">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p[@xml:lang='arn']/text()[normalize-space()]">
<xsl:variable name='orig' select="."/>
<xsl:variable name='lang' select="$orig/ancestor::*[normalize-space(@xml:lang)][1]/@xml:lang"/>
<xsl:analyze-string select="." regex="[\p{{L}}\p{{N}}]+">
<xsl:matching-substring>
<xsl:element name="w">
<xsl:attribute name="xml:lang"><xsl:value-of select="$lang"/></xsl:attribute>
<xsl:value-of select="."/>
</xsl:element>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
This gives me the desired output:
<?xml version="1.0" encoding="UTF-8"?><text>
<body>
<div>
<p xml:lang="arn"><w xml:lang="arn">Fei</w> <w xml:lang="arn">meu</w> <w xml:lang="arn">nùkei</w> <w xml:lang="arn">neməl</w> <w xml:lang="arn">təfa</w></p>
<p xml:lang="spa">Entonces toma la palabra él</p>
<p xml:lang="arn"><w xml:lang="arn">Fei</w> <w xml:lang="arn">meu</w> <w xml:lang="arn">nùkei</w> <w xml:lang="arn">neməl</w> <w xml:lang="arn">təfa</w></p>
<p xml:lang="spa">Entonces toma la palabra él</p>
</div>
</body></text>
However, when the input has a TEI header, as follows, I get the input file back.
<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<fileDesc>
<titleStmt>
<title></title>
</titleStmt>
<publicationStmt><ab></ab></publicationStmt>
<sourceDesc><ab></ab></sourceDesc>
</fileDesc>
</teiHeader>
<text>
<body>
<div>
<p xml:lang="arn">Fei meu nùkei neməl təfa</p>
<p xml:lang="spa">Entonces toma la palabra él</p>
<p xml:lang="arn">Fei meu nùkei neməl təfa</p>
<p xml:lang="spa">Entonces toma la palabra él</p>
</div>
</body></text>
</TEI>
Any suggestions to avoid this?
In the second version your whole XML is in the default namespace "http://www.tei-c.org/ns/1.0"
. So all children of this namespace defined on the root element are in this same namespace.
One simple solution is to add the line
xpath-default-namespace="http://www.tei-c.org/ns/1.0"
to your xsl:stylesheet
element of the XSLT.