Search code examples
pythonxmlxsltxml-namespaceslibxslt

Python, libxslt and finding objects in the default namespace


I've been having a terrible time finding any examples of XSTL processing with the python libxml2 library and XSLT. I have a set of legacy documents with a default namespace, and I've been trying convert them into something I can import into a tinkerpop-compliant database. The legacy data has a default namespace, and I can't figure out how to convince libxslt to find anything in the data.

As you can see from my examples, I can't seem to get anything from an inner template to render at all. It does seem to find the topmost (cmap) template, as it spits out the <graphml> boilerplate. I am fairly new to XSLT, so this may be just a shortcoming, but nobody on SO or the google seems to have any examples of this working.

I've thought about just ripping the offending default namespace out with a regexp, but parsing XML with a regexp is usually a bad plan, and it just seems like the wrong idea.

I have the following XML:

<?xml version="1.0" encoding="UTF-8"?>
  <cmap xmlns="http://cmap.ihmc.us/xml/cmap/">
    <map width="1940" height="3701">
      <concept-list>
        <concept id="1JNW5YSZP-14KK308-5VS2" label="Solving Linear&#xa;Systems by&#xa;Elimination&#xa;[MAT.ALG.510]"/>
        <concept id="1JNW55K3S-27XNMQ0-5T80" label="Using&#xa;Inequalities&#xa;[MAT.ALG.423]"/>
      </concept-list
    </map>
  </cmap>

There's much more, but this is a sample of it. I was able, using the xpathRegisterNS() command, to register the default namespace and find my map, concept-map, etc with it. I have not had the same luck when trying to process this with libxslt.

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:c="http://cmap.ihmc.us/xml/cmap/">
  <xsl:output method="xml" indent="yes"/>
  <xsl:template match="c:cmap">
    <graphml xmlns="http://graphml.graphdrawing.org/xmlns">
      <xsl:apply-templates select="c:concept"/>
    </graphml>      
  </xsl:template>
  <xsl:template match="c:concept">
    <node> Found a node </node>
  </xsl:template>
</xsl:stylesheet>

And the python experiment is just:

 import libxml2
 import libxslt
 styledoc = libxml2.parseFile("cxltographml.xsl")
 style = libxslt.parseStylesheetDoc(styledoc)
 doc = libxml2.parseFile("algebra.cxl")
 result = style.applyStylesheet(doc, None)
 print style.saveResultToString(result)

Solution

  • You've got the right technique regarding namespaces in the xslt, namely you must map the uri to a prefix as the "default namespace" doesn't apply to xpaths or template match expressions. The problem is that in your c:cmap template you're doing

      <xsl:apply-templates select="c:concept"/>
    

    But the cmap element doesn't have any direct children named concept. Try

      <xsl:apply-templates select="c:map/c:concept-list/c:concept"/>
    

    or more generally (but potentially less efficient)

      <xsl:apply-templates select=".//c:concept"/>
    

    to find all descendant concept elements rather than just immediate children.

    Also, in the c:concept template you will need to add xmlns="http://graphml.graphdrawing.org/xmlns" to the <node> element otherwise it will be output in no namespace (with xmlns="").