I'm experimenting with writing an XSLT stylesheet to generate HTML from a Text encoded in XML according to TEI standard.
Now, when it comes to special characters, I'm running into difficulties - here's an example:
The word "ſem" (normalized "sem", old norse relative pronoun) would be encoded <g ref="#slong"/>em
, which refers to the following declaration in the header:
<glyph xml:id="slong">
<glyphName>LATIN SMALL LETTER LONG S</glyphName>
<mapping type="facs">U+017F</mapping>
<mapping type="norm">s</mapping>
</glyph>
Of course, the idea would be, to be able to look up the mappings for every glyph, and then display it accordingly.
E.g. if I wanted to have a stylesheet that shows a normalized rendering of the text, I'd have something like
<!-- store all my glyphs in a key -->
<xsl:key name="glyphs" match="tei:glyph" use="@xml:id"/>
<!-- handle glyphs, storing every step in a variable for debugging purposes -->
<xsl:template match="tei:g">
<xsl:variable name="g_name" select="substring(@ref,2)"/>
<xsl:variable name="glyph" select="key('glyphs', $g_name)"/>
<xsl:variable name="mapping" select="$glyph/tei:mapping[@type='norm']"/>
<xsl:value-of select="$mapping"/>
</xsl:template>
This would, as expected, output "sem".
But, if I want to write a stylesheet that displays the text diplomatically, I'd want the output to be "ſem".
For that, I started with:
<xsl:template match="tei:g">
<xsl:variable name="g_name" select="substring(@ref,2)"/>
<xsl:variable name="glyph" select="key('glyphs', $g_name)"/>
<xsl:variable name="mapping" select="$glyph/tei:mapping[@type='facs']"/>
<xsl:value-of select="$mapping"/>
</xsl:template>
That gave me "U+017Fem". Of course, that's not a HTML entity for the expected special character.
So I tried:
<xsl:template match="tei:g">
<xsl:variable name="g_name" select="substring(@ref,2)"/>
<xsl:variable name="glyph" select="key('glyphs', $g_name)"/>
<xsl:variable name="mapping" select="$glyph/tei:mapping[@type='facs']"/>
<xsl:variable name="entity" select="concat('&#x',substring($mapping,3),';')"/>
<xsl:value-of select="$entity"/>
</xsl:template>
That outputs ſem
, which looks a lot more like a HTML hex entity. But sadly, it still gets displayed as such, and not interpreted as the character represented by the entity.
And I can't for the life of me figure out, how I get it to do that.
PS: If that helps, I'm not writing a stylesheet to create a HTML file that I open in the browser afterwards; I have a HTML file with a JavaScript function, that converts the XML data to HTML "on the fly".
Edit:
As pointed out by Martin Honnen, on non-Mozilla browsers, <xsl:value-of select="$entity" disable-output-escaping="yes"/>
should suffice (see https://xsltfiddle.liberty-development.net/ejivdH4/2).
Yet, for me, that still doesn't work. So I'm guessing I'm missing something important. Here are my full files (file.xml is shortened/changed, because the original is work in prograss by others, buit the result is the same).
file.xml:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml"
schematypens="http://purl.oclc.org/dsdl/schematron"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Title</title>
</titleStmt>
<publicationStmt>
<p>Publication Information</p>
</publicationStmt>
<sourceDesc>
<p>Information about the source</p>
</sourceDesc>
</fileDesc>
<encodingDesc>
<charDecl>
<desc>Variant letter forms</desc>
<glyph xml:id="aalig">
<glyphName>LATIN SMALL LIGATURE AA</glyphName>
<mapping type="facs">U+EFA0</mapping>
<mapping type="norm">aa</mapping>
</glyph>
<glyph xml:id="fins">
<glyphName>LATIN SMALL LETTER INSULAR F</glyphName>
<mapping type="facs">U+F207</mapping>
<mapping type="norm">f</mapping>
</glyph>
<glyph xml:id="jscap">
<glyphName>LATIN LETTER SMALL CAPITAL J</glyphName>
<mapping type="facs">U+1DOA</mapping>
</glyph>
<glyph xml:id="nscap">
<glyphName>LATIN LETTER SMALL CAPITAL N</glyphName>
<mapping type="facs">U+0274</mapping>
</glyph>
<glyph xml:id="rrot">
<glyphName>LATIN SMALL LETTER R ROTUNDA</glyphName>
<mapping type="facs">U+A75B</mapping>
<mapping type="norm">r</mapping>
</glyph>
<glyph xml:id="rscap">
<glyphName>LATIN LETTER SMALL CAPITAL R</glyphName>
<mapping type="facs">U+0280</mapping>
</glyph>
<glyph xml:id="slong">
<glyphName>LATIN SMALL LETTER LONG S</glyphName>
<mapping type="facs">U+017F</mapping>
<mapping type="norm">s</mapping>
</glyph>
<glyph xml:id="sscap">
<glyphName>LATIN LETTER SMALL CAPITAL S</glyphName>
<mapping type="facs">U+A731</mapping>
</glyph>
</charDecl>
<charDecl>
<desc>Abbreviation marks</desc>
<glyph xml:id="ar">
<glyphName>LATIN ABBREVIATION SIGN</glyphName>
<mapping type="facs">U+036C</mapping>
</glyph>
<glyph xml:id="asup">
<glyphName>COMBINING LATIN SMALL LETTER A</glyphName>
<mapping type="facs">U+0363</mapping>
</glyph>
<glyph xml:id="bar">
<glyphName>COMBINING ABBREVIATION MARK BAR ABOVE</glyphName>
<mapping type="facs">U+0305</mapping>
</glyph>
<glyph xml:id="combcurl">
<glyphName>COMBINING OGONEK ABOVE</glyphName>
<mapping type="facs">U+1DCS</mapping>
</glyph>
<glyph xml:id="csup">
<glyphName>COMBINING LATIN SMALL LETTER C</glyphName>
<mapping type="facs">U+0368</mapping>
</glyph>
<glyph xml:id="dot">
<glyphName>DOT ABOVE</glyphName>
<mapping type="facs">U+02D9</mapping>
</glyph>
<glyph xml:id="dsup">
<glyphName>COMBINING LATIN SMALL LETTER D</glyphName>
<mapping type="facs">U+0369</mapping>
</glyph>
<glyph xml:id="er">
<glyphName>COMBINING ABBREVIATION MARK ZIGZAG ABOVE</glyphName>
<mapping type="facs">U+035B</mapping>
</glyph>
<glyph xml:id="et">
<glyphName>LATIN ABBREVIATION SIGN SMALL ET WITH STROKE</glyphName>
<mapping type="facs">U+F158</mapping>
<mapping type="norm">&</mapping>
</glyph>
<glyph xml:id="ezh">
<glyphName>LATIN SMALL LETTER EZH</glyphName>
<mapping type="facs">U+0292</mapping>
</glyph>
<glyph xml:id="isup">
<glyphName>COMBINING LATIN SMALL LETTER I</glyphName>
<mapping type="facs">U+0365</mapping>
</glyph>
<glyph xml:id="nsup">
<glyphName>COMBINING LATIN SMALL LETTER N</glyphName>
<mapping type="facs">U+F021</mapping>
</glyph>
<glyph xml:id="osup">
<glyphName>COMBINING LATIN SMALL LETTER O</glyphName>
<mapping type="facs">U+0366</mapping>
</glyph>
<glyph xml:id="ra">
<glyphName>COMBINING LATIN SMALL LETTER FLATTENED OPEN A ABOVE</glyphName>
<mapping type="facs">U+F1C1</mapping>
</glyph>
<glyph xml:id="rsup">
<glyphName>COMBINING LATIN SMALL LETTER R</glyphName>
<mapping type="facs">U+036C</mapping>
</glyph>
<glyph xml:id="tsup">
<glyphName>COMBINING LATIN SMALL LETTER T</glyphName>
<mapping type="facs">U+036D</mapping>
</glyph>
<glyph xml:id="ur">
<glyphName>COMBINING ABBREVIATION MARK SUPERSCRIPT UR ROUND R FORM</glyphName>
<mapping type="facs">U+F153</mapping>
</glyph>
<glyph xml:id="us">
<glyphName>COMBINING US ABOVE</glyphName>
<mapping type="facs">U+1DD2</mapping>
</glyph>
<glyph xml:id="zsup">
<glyphName>COMBINING LATIN SMALL LETTER Z</glyphName>
<mapping type="facs">U+00B3</mapping>
</glyph>
</charDecl>
</encodingDesc>
</teiHeader>
<text>
<body>
<!-- Add your data between here ... -->
<div type="miracle" n="75">
<pb n="473"/>
<head> <lb n="2"/>Bla</head>
<p>
<g ref="#slong"/>em
</p>
</div>
</body>
</text>
</TEI>
page.html
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<script>
function loadXMLDoc(filename)
{
if (window.ActiveXObject)
{
xhttp = new ActiveXObject("Msxml2.XMLHTTP");
}
else
{
xhttp = new XMLHttpRequest();
}
xhttp.open("GET", filename, false);
try {xhttp.responseType = "msxml-document"} catch(err) {} // Helping IE11
xhttp.send("");
return xhttp.responseXML;
}
function displayResult(style)
{
console.log('Generating...');
xml = loadXMLDoc("file.xml");
xsl = loadXMLDoc(style);
// code for IE
if (window.ActiveXObject || xhttp.responseType == "msxml-document")
{
ex = xml.transformNode(xsl);
document.getElementById("example").innerHTML = ex;
}
// code for Chrome, Firefox, Opera, etc.
else if (document.implementation && document.implementation.createDocument)
{
xsltProcessor = new XSLTProcessor();
xsltProcessor.importStylesheet(xsl);
resultDocument = xsltProcessor.transformToFragment(xml, document);
const node = document.getElementById("example");
while (node.firstChild){
node.removeChild(node.firstChild);
}
node.appendChild(resultDocument);
}
}
</script>
</head>
<body onload="displayResult('facs.xsl')">
<h1>Test</h1>
<div>
<button onclick="displayResult('facs.xsl')">facs</button>
<button onclick="displayResult('dipl.xsl')">dipl</button>
</div>
<div id="example" />
</body>
</html>
facs.xsl
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:tei="http://www.tei-c.org/ns/1.0">
<xsl:key name="glyphs" match="tei:glyph" use="@xml:id"/>
<xsl:template match="/">
<h3>TEI Rendering: Facsimile</h3>
<div>
<xsl:apply-templates select="//tei:div[@type='miracle']"/>
</div>
</xsl:template>
<xsl:template match="tei:div[@type='miracle']">
<h5>
Miracle:
<xsl:value-of select="@n"/>
</h5>
<div class="miracle">
<xsl:apply-templates/>
</div>
</xsl:template>
<xsl:template match="tei:head">
<div style="color:red">
<xsl:apply-templates/>
</div>
</xsl:template>
<xsl:template match="tei:pb">
<br/>
(<xsl:value-of select="@n"/>)
<br/>
</xsl:template>
<xsl:template match="tei:lb">
<br/><xsl:value-of select="@n"/>:
</xsl:template>
<xsl:template match="tei:am">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="tei:g">
<xsl:variable name="g_name" select="substring(@ref,2)"/>
<xsl:variable name="glyph" select="key('glyphs', $g_name)"/>
<xsl:variable name="mapping" select="$glyph/tei:mapping[@type='facs']"/>
<xsl:variable name="entity" select="concat('&#x',substring($mapping,3),';')"/>
<xsl:value-of select="$entity" disable-output-escaping="yes"/>
<xsl:variable name="something" select="'&#x0305;'"/>
{<xsl:value-of select="$something" disable-output-escaping="yes"/>}
</xsl:template>
</xsl:stylesheet>
dipl.xsl
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:tei="http://www.tei-c.org/ns/1.0">
<xsl:template match="/">
<h3>TEI Rendering: Diplomatic</h3>
<div>
<xsl:apply-templates select="//tei:div[@type='miracle']"/>
</div>
</xsl:template>
<xsl:template match="tei:div[@type='miracle']">
<h5>
Miracle:
<xsl:value-of select="@n"/>
</h5>
<div class="miracle">
<xsl:apply-templates/>
</div>
</xsl:template>
<xsl:template match="tei:head">
<div style="color:red">
<xsl:apply-templates/>
</div>
</xsl:template>
<xsl:template match="tei:pb">
||
</xsl:template>
<xsl:template match="tei:lb">
|
</xsl:template>
<xsl:template match="tei:ex">
<i>
<xsl:apply-templates/>
</i>
</xsl:template>
</xsl:stylesheet>
I'm viewing the file as localhost (with a python server running) in my browser.
Any thoughts, what I might be missing or doing wrong?
Note: A lookup-table is not what I want, bevause potentially, there might be as many special characters in a TEI-XML, as there are unicode characters. That's what the glyphe-mappings are here for.
XSLT 2.0 might be an option; but I haven't figured out how to do a 2.0 transformation in the browser via JavaScript.
Edit 2:
I don't know what had gone wrong when I tested it first, but on IE it works with <xsl:value-of select="$entity" disable-output-escaping="yes"/>
.
But since it doesn't work with Firefox, I decided to change the whole design: I transform the XML on server side with PHP and send the HTML to the client; that should work with every browser.
If you target Chrome or Edge or IE then I think using <xsl:value-of select="$entity" disable-output-escaping="yes"/>
will suffice, in https://xsltfiddle.liberty-development.net/ejivdH4/2 that works to output ſem
for the first two browsers and the hexadecimal character reference ſem
for IE with the transformation done in the browser using the Javascript API.
Mozilla browsers are known not to support disable-output-escaping
, so for cross-browser, client-side XSLT 1 the suggestion "to construct your own lookup table" by michael.hor257k is probably the better option.