Search code examples
javascriptxmlnode.jsxsltevernote

I'm trying to transform Evernote's ENMLinto markdown using XSLT and node_xslt complains about unknown tags and entities


Is there a way to apply XSLT to XML in sever-side code? So far the best I have found is https://github.com/bsuh/node_xslt. But it has one major disadvantage: looks like it doesn't work with custom namespaces.

To be more precise: I'm trying to transform Evernote's ENML (https://dev.evernote.com/doc/articles/enml.php) into markdown using XSLT and node_xslt complains about unknown tags and entities.


Here is what I'm doing:

test.xml:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="test.xslt"?>
<!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd">
<en-note style="background: #e6e6e6;">
    <h1>Sample&nbsp;header</h1>
</en-note>

test.xslt:

<?xml version='1.0' encoding='utf-8'?>

<xsl:stylesheet
    version='1.0'
    xmlns:e="http://xml.evernote.com/pub/enml2.dtd"
    xmlns:xsl='http://www.w3.org/1999/XSL/Transform'>

<xsl:output method='text' encoding='utf-8'/>

<xsl:template match="/*">
    <xsl:value-of select="name()"/>
</xsl:template>

</xsl:stylesheet>

main.js:

var xslt = require('node_xslt');
var stylesheet = xslt.readXsltFile('test.xslt');
var doc = xslt.readXmlFile('test.xml');
xslt.transform(stylesheet, doc, []);

And I got error:

test.xml:5: parser error : Entity 'nbsp' not defined
    <h1>Sample&nbsp;header</h1>

When I try to read it as html I got following error:

test.xml:4: HTML parser error : Tag en-note invalid
<en-note style="background: #e6e6e6;">

And I don't ask how to use this library, I'm trying to understand how at all I can perform necessary transform on server side if it's possible.


Solution

  • Your XML depends on a DTD (Evernote DTD) which is downloaded from:

    http://xml.evernote.com/pub/enml2.dtd

    That DTD loads three other DTDs. This one:

    http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent

    declares your &nbsp; entity:

    <!ENTITY nbsp   "&#160;"> <!-- no-break space = non-breaking space, U+00A0 ISOnum -->
    

    If any one of those two files for some reason is not loaded, you will get the Entity 'nbsp' not defined error.

    You can download all the DTDs and edit the files so they load from a local site. If you only have that one &nbsp; entity in your file, you can also redefine it locally:

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="test.xslt"?>
    <!DOCTYPE en-note SYSTEM "http://xml.evernote.com/pub/enml2.dtd" [
       <!ENTITY nbsp "&#160;">
    ]>
    <en-note style="background: #e6e6e6;">
        <h1>Sample&nbsp;header</h1>
    </en-note>
    

    That should fix the Entity 'nbsp' not defined error.

    Obviously the <en-note> tag will not validate for HTML. You need to validate it as Evernote.

    Your XSLT transform is simply reading the tag's name and printing it.