Search code examples
xmlxsltxslt-1.0tei

How to create a HTML hex entity with XSLT


I'm experimenting with writing an XSLT stylesheet to generate HTML from a Text encoded in XML according to TEI standard.

Now, when it comes to special characters, I'm running into difficulties - here's an example: The word "ſem" (normalized "sem", old norse relative pronoun) would be encoded <g ref="#slong"/>em, which refers to the following declaration in the header:

<glyph xml:id="slong">
   <glyphName>LATIN SMALL LETTER LONG S</glyphName>
   <mapping type="facs">U+017F</mapping>
   <mapping type="norm">s</mapping>
</glyph>

Of course, the idea would be, to be able to look up the mappings for every glyph, and then display it accordingly.
E.g. if I wanted to have a stylesheet that shows a normalized rendering of the text, I'd have something like

<!-- store all my glyphs in a key -->
<xsl:key name="glyphs" match="tei:glyph" use="@xml:id"/>

<!-- handle glyphs, storing every step in a variable for debugging purposes -->
<xsl:template match="tei:g">
   <xsl:variable name="g_name" select="substring(@ref,2)"/>
   <xsl:variable name="glyph" select="key('glyphs', $g_name)"/>
   <xsl:variable name="mapping" select="$glyph/tei:mapping[@type='norm']"/>
   <xsl:value-of select="$mapping"/>
</xsl:template>

This would, as expected, output "sem".

But, if I want to write a stylesheet that displays the text diplomatically, I'd want the output to be "ſem".
For that, I started with:

<xsl:template match="tei:g">
   <xsl:variable name="g_name" select="substring(@ref,2)"/>
   <xsl:variable name="glyph" select="key('glyphs', $g_name)"/>
   <xsl:variable name="mapping" select="$glyph/tei:mapping[@type='facs']"/>
   <xsl:value-of select="$mapping"/>
</xsl:template>

That gave me "U+017Fem". Of course, that's not a HTML entity for the expected special character.

So I tried:

<xsl:template match="tei:g">
   <xsl:variable name="g_name" select="substring(@ref,2)"/>
   <xsl:variable name="glyph" select="key('glyphs', $g_name)"/>
   <xsl:variable name="mapping" select="$glyph/tei:mapping[@type='facs']"/>
   <xsl:variable name="entity" select="concat('&amp;#x',substring($mapping,3),';')"/>
   <xsl:value-of select="$entity"/>
</xsl:template>

That outputs &#x017F;em, which looks a lot more like a HTML hex entity. But sadly, it still gets displayed as such, and not interpreted as the character represented by the entity.

And I can't for the life of me figure out, how I get it to do that.

PS: If that helps, I'm not writing a stylesheet to create a HTML file that I open in the browser afterwards; I have a HTML file with a JavaScript function, that converts the XML data to HTML "on the fly".

Edit:

As pointed out by Martin Honnen, on non-Mozilla browsers, <xsl:value-of select="$entity" disable-output-escaping="yes"/> should suffice (see https://xsltfiddle.liberty-development.net/ejivdH4/2).

Yet, for me, that still doesn't work. So I'm guessing I'm missing something important. Here are my full files (file.xml is shortened/changed, because the original is work in prograss by others, buit the result is the same).

file.xml:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml"
    schematypens="http://purl.oclc.org/dsdl/schematron"?>


<TEI xmlns="http://www.tei-c.org/ns/1.0">
   <teiHeader>
      <fileDesc>
         <titleStmt>
            <title>Title</title>
         </titleStmt>
         <publicationStmt>
            <p>Publication Information</p>
         </publicationStmt>
         <sourceDesc>
            <p>Information about the source</p>
         </sourceDesc>
      </fileDesc>
      <encodingDesc>
         <charDecl>
            <desc>Variant letter forms</desc>
            <glyph xml:id="aalig">
               <glyphName>LATIN SMALL LIGATURE AA</glyphName>
               <mapping type="facs">U+EFA0</mapping>
               <mapping type="norm">aa</mapping>
            </glyph>           
            <glyph xml:id="fins">
               <glyphName>LATIN SMALL LETTER INSULAR F</glyphName>
               <mapping type="facs">U+F207</mapping>
               <mapping type="norm">f</mapping>
            </glyph>
            <glyph xml:id="jscap">
               <glyphName>LATIN LETTER SMALL CAPITAL J</glyphName>
               <mapping type="facs">U+1DOA</mapping>
            </glyph>
            <glyph xml:id="nscap">
               <glyphName>LATIN LETTER SMALL CAPITAL N</glyphName>
               <mapping type="facs">U+0274</mapping>
            </glyph>
            <glyph xml:id="rrot">
               <glyphName>LATIN SMALL LETTER R ROTUNDA</glyphName>
               <mapping type="facs">U+A75B</mapping>
               <mapping type="norm">r</mapping>
            </glyph>
            <glyph xml:id="rscap">
               <glyphName>LATIN LETTER SMALL CAPITAL R</glyphName>
               <mapping type="facs">U+0280</mapping>
            </glyph>
            <glyph xml:id="slong">
               <glyphName>LATIN SMALL LETTER LONG S</glyphName>
               <mapping type="facs">U+017F</mapping>
               <mapping type="norm">s</mapping>
            </glyph>
            <glyph xml:id="sscap">
               <glyphName>LATIN LETTER SMALL CAPITAL S</glyphName>
               <mapping type="facs">U+A731</mapping>
            </glyph>
         </charDecl>
         <charDecl>
            <desc>Abbreviation marks</desc>
            <glyph xml:id="ar">
               <glyphName>LATIN ABBREVIATION SIGN</glyphName>
               <mapping type="facs">U+036C</mapping>
            </glyph>
            <glyph xml:id="asup">
               <glyphName>COMBINING LATIN SMALL LETTER A</glyphName>
               <mapping type="facs">U+0363</mapping>
            </glyph>
            <glyph xml:id="bar">
               <glyphName>COMBINING ABBREVIATION MARK BAR ABOVE</glyphName>
               <mapping type="facs">U+0305</mapping>
            </glyph>
            <glyph xml:id="combcurl">
               <glyphName>COMBINING OGONEK ABOVE</glyphName>
               <mapping type="facs">U+1DCS</mapping>
            </glyph>
            <glyph xml:id="csup">
               <glyphName>COMBINING LATIN SMALL LETTER C</glyphName>
               <mapping type="facs">U+0368</mapping>
            </glyph>
            <glyph xml:id="dot">
               <glyphName>DOT ABOVE</glyphName>
               <mapping type="facs">U+02D9</mapping>
            </glyph>
            <glyph xml:id="dsup">
               <glyphName>COMBINING LATIN SMALL LETTER D</glyphName>
               <mapping type="facs">U+0369</mapping>
            </glyph>
            <glyph xml:id="er">
               <glyphName>COMBINING ABBREVIATION MARK ZIGZAG ABOVE</glyphName>
               <mapping type="facs">U+035B</mapping>
            </glyph>
            <glyph xml:id="et">
               <glyphName>LATIN ABBREVIATION SIGN SMALL ET WITH STROKE</glyphName>
               <mapping type="facs">U+F158</mapping>
               <mapping type="norm">&amp;</mapping>
            </glyph>
            <glyph xml:id="ezh">
               <glyphName>LATIN SMALL LETTER EZH</glyphName>
               <mapping type="facs">U+0292</mapping>
            </glyph>
            <glyph xml:id="isup">
               <glyphName>COMBINING LATIN SMALL LETTER I</glyphName>
               <mapping type="facs">U+0365</mapping>
            </glyph>
            <glyph xml:id="nsup">
               <glyphName>COMBINING LATIN SMALL LETTER N</glyphName>
               <mapping type="facs">U+F021</mapping>
            </glyph>
            <glyph xml:id="osup">
               <glyphName>COMBINING LATIN SMALL LETTER O</glyphName>
               <mapping type="facs">U+0366</mapping>
            </glyph>
            <glyph xml:id="ra">
               <glyphName>COMBINING LATIN SMALL LETTER FLATTENED OPEN A ABOVE</glyphName>
               <mapping type="facs">U+F1C1</mapping>
            </glyph>
            <glyph xml:id="rsup">
               <glyphName>COMBINING LATIN SMALL LETTER R</glyphName>
               <mapping type="facs">U+036C</mapping>
            </glyph>
            <glyph xml:id="tsup">
               <glyphName>COMBINING LATIN SMALL LETTER T</glyphName>
               <mapping type="facs">U+036D</mapping>
            </glyph>
            <glyph xml:id="ur">
               <glyphName>COMBINING ABBREVIATION MARK SUPERSCRIPT UR ROUND R FORM</glyphName>
               <mapping type="facs">U+F153</mapping>
            </glyph>
            <glyph xml:id="us">
               <glyphName>COMBINING US ABOVE</glyphName>
               <mapping type="facs">U+1DD2</mapping>
            </glyph>
            <glyph xml:id="zsup">
               <glyphName>COMBINING LATIN SMALL LETTER Z</glyphName>
               <mapping type="facs">U+00B3</mapping>
            </glyph>
         </charDecl>
      </encodingDesc>

   </teiHeader>
   <text>
      <body> 

         <!-- Add your data between here ... -->


         <div type="miracle" n="75">

            <pb n="473"/>


            <head> <lb n="2"/>Bla</head>

            <p>
               <g ref="#slong"/>em 
            </p>

         </div>
      </body>
   </text>
</TEI>

page.html

<!DOCTYPE html>
<html>
    <head>
      <meta charset="utf-8"/>
        <script>
function loadXMLDoc(filename)
{
if (window.ActiveXObject)
  {
  xhttp = new ActiveXObject("Msxml2.XMLHTTP");
  }
else
  {
  xhttp = new XMLHttpRequest();
  }
xhttp.open("GET", filename, false);
try {xhttp.responseType = "msxml-document"} catch(err) {} // Helping IE11
xhttp.send(""); 
return xhttp.responseXML;
} 

function displayResult(style)
{
console.log('Generating...');
xml = loadXMLDoc("file.xml");
xsl = loadXMLDoc(style);
// code for IE
if (window.ActiveXObject || xhttp.responseType == "msxml-document")
  {
  ex = xml.transformNode(xsl);
  document.getElementById("example").innerHTML = ex;
  }
// code for Chrome, Firefox, Opera, etc.
else if (document.implementation && document.implementation.createDocument)
  {
  xsltProcessor = new XSLTProcessor();
  xsltProcessor.importStylesheet(xsl);
  resultDocument = xsltProcessor.transformToFragment(xml, document);
  const node = document.getElementById("example");
  while (node.firstChild){
    node.removeChild(node.firstChild);
   }
  node.appendChild(resultDocument);
  }
}
</script>
    </head> 
    <body onload="displayResult('facs.xsl')">
      <h1>Test</h1>
      <div>
        <button onclick="displayResult('facs.xsl')">facs</button>
        <button onclick="displayResult('dipl.xsl')">dipl</button>
      </div>
        <div id="example" />
    </body>
</html>

facs.xsl

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:tei="http://www.tei-c.org/ns/1.0">


  <xsl:key name="glyphs" match="tei:glyph" use="@xml:id"/>


<xsl:template match="/">
  <h3>TEI Rendering: Facsimile</h3>
  <div>
    <xsl:apply-templates select="//tei:div[@type='miracle']"/>
  </div>
</xsl:template>


  <xsl:template match="tei:div[@type='miracle']">
    <h5>
      Miracle: 
      <xsl:value-of select="@n"/>
    </h5>
    <div class="miracle">
      <xsl:apply-templates/>
    </div>
  </xsl:template>

  <xsl:template match="tei:head">
    <div style="color:red">
      <xsl:apply-templates/>
    </div>
  </xsl:template>

  <xsl:template match="tei:pb">
    <br/>
    (<xsl:value-of select="@n"/>)
    <br/>
  </xsl:template>

  <xsl:template match="tei:lb">
    <br/><xsl:value-of select="@n"/>: 
  </xsl:template>

  <xsl:template match="tei:am">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="tei:g">
    <xsl:variable name="g_name" select="substring(@ref,2)"/>
    <xsl:variable name="glyph" select="key('glyphs', $g_name)"/>
    <xsl:variable name="mapping" select="$glyph/tei:mapping[@type='facs']"/>
    <xsl:variable name="entity" select="concat('&amp;#x',substring($mapping,3),';')"/>
    <xsl:value-of select="$entity" disable-output-escaping="yes"/>



    <xsl:variable name="something" select="'&amp;#x0305;'"/>
    {<xsl:value-of select="$something" disable-output-escaping="yes"/>}
  </xsl:template>



</xsl:stylesheet>

dipl.xsl

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:tei="http://www.tei-c.org/ns/1.0">



<xsl:template match="/">
  <h3>TEI Rendering: Diplomatic</h3>
  <div>
    <xsl:apply-templates select="//tei:div[@type='miracle']"/>
  </div>
</xsl:template>


  <xsl:template match="tei:div[@type='miracle']">
    <h5>
      Miracle: 
      <xsl:value-of select="@n"/>
    </h5>
    <div class="miracle">
      <xsl:apply-templates/>
    </div>
  </xsl:template>

  <xsl:template match="tei:head">
    <div style="color:red">
      <xsl:apply-templates/>
    </div>
  </xsl:template>

  <xsl:template match="tei:pb">
     || 
  </xsl:template>

  <xsl:template match="tei:lb">
    |
  </xsl:template>

  <xsl:template match="tei:ex">
    <i>
      <xsl:apply-templates/>
    </i>
  </xsl:template>



</xsl:stylesheet>

I'm viewing the file as localhost (with a python server running) in my browser.

Any thoughts, what I might be missing or doing wrong?

Note: A lookup-table is not what I want, bevause potentially, there might be as many special characters in a TEI-XML, as there are unicode characters. That's what the glyphe-mappings are here for.

XSLT 2.0 might be an option; but I haven't figured out how to do a 2.0 transformation in the browser via JavaScript.

Edit 2:

I don't know what had gone wrong when I tested it first, but on IE it works with <xsl:value-of select="$entity" disable-output-escaping="yes"/>.
But since it doesn't work with Firefox, I decided to change the whole design: I transform the XML on server side with PHP and send the HTML to the client; that should work with every browser.


Solution

  • If you target Chrome or Edge or IE then I think using <xsl:value-of select="$entity" disable-output-escaping="yes"/> will suffice, in https://xsltfiddle.liberty-development.net/ejivdH4/2 that works to output ſem for the first two browsers and the hexadecimal character reference &#x017F;em for IE with the transformation done in the browser using the Javascript API.

    Mozilla browsers are known not to support disable-output-escaping, so for cross-browser, client-side XSLT 1 the suggestion "to construct your own lookup table" by michael.hor257k is probably the better option.