Search code examples
xmlxsltsaxondtd

Is there a way to disregard a referenced dtd when running an xslt?


When I run the following templates using Saxon in Oxygen:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    exclude-result-prefixes="xs math"
    expand-text="yes"
    version="3.0">    
    <xsl:output indent="yes" method="xml" omit-xml-declaration="no" encoding="utf-8"/>
    
    <xsl:template match="/">   
        <xsl:text>&#xa;</xsl:text>
        <xsl:apply-templates select="*"/>
    </xsl:template>
    
    <xsl:template match="*">
<!-- On Martin's suggestion I should use node-name instead of name, so I have changed this, but the result is the same. -->
        <xsl:value-of select="node-name()"/><xsl:text>&#xa;</xsl:text>
        <xsl:apply-templates select="*"/>
    </xsl:template>
</xsl:stylesheet>

On this xml:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ddn PUBLIC "-//S1000D//DTD Data Dispatch Note 20050501//EN//XML" "http://www.s1000d.org/s1000d_2-2/xml_dtd/ddn/dtd/ddn.dtd">
<ddn>
    <ddnc>
        <modelic>ABC</modelic>
        <sendid>AASSD</sendid>
        <recvid>VVBBN</recvid>
        <diyear>2024</diyear>
        <seqnum>00001</seqnum>
    </ddnc>
</ddn>

I get the following output:

<?xml version="1.0" encoding="utf-8"?>
ddn
ddnc
modelic
sendid
recvid
diyear
seqnum

So clearly (to me at least), the transform knows the element names.

If I change the template matching the elements to:

<xsl:template match="ddn">
     <xsl:value-of select="node-name()"/><xsl:text>&#xa;</xsl:text>
     <xsl:apply-templates select="*"/>
</xsl:template>

I get the following which doesn't include any element names:

<?xml version="1.0" encoding="utf-8"?>
        ABC
        AASSD
        VVBBN
        2024
        0000

If I remove the doctype declaration and run the same transformation I get:

<?xml version="1.0" encoding="utf-8"?>
ddn
        ABC
        AASSD
        VVBBN
        2024
        00001

So the root ddn is now found. Conclusion is that the dtd is used by the transformation.

I would rather disregard the dtd rather than trying to correct something in it since the dtd isn't mine to begin with. I just need to transform the content of the file I got, and the xml that is included in this question is only a small fragment of an actual file, but the problem is the same no matter the content.

So how can I get around this problem? Do I need to add some namespace to my rules (although the name function didn't produce anything like that), or can I tell Saxon to disregrd the dtd? It looks as if this is the default in the settings, but I suspect there is something else I am missing here.

I have tried the same transform using XMLSpy with the built in xslt engine, and it behaves in the same way.

If I add * as namespace like this: If I change the template matching the elements to:

<xsl:template match="*:ddn">
     <xsl:value-of select="node-name()"/><xsl:text>&#xa;</xsl:text>
     <xsl:apply-templates select="*"/>
</xsl:template>

I get:

<?xml version="1.0" encoding="utf-8"?>
ddn
        ABC
        AASSD
        VVBBN
        2024
        00001

So this works, but why?!?

Suggestions?


Solution

  • I downloaded some DTD and it has

    <!ELEMENT ddn  (rdf:Description?,ddnc,issdate,security,datarest?,
                    dispto,dispfrom,authrtn,mediaid?,remarks?,delivlst?) >
    <!ATTLIST ddn
          id            ID      #IMPLIED
          xmlns         CDATA   #FIXED  "http://www.s1000d.org/ddn"
              %RDFDCATT; >
    

    so based on that for any non-prefixed elements I would expect that declaring xpath-default-namespace="http://www.s1000d.org/ddn" in the XSLT allows you to select and/or match elements like ddn, ddnc, issdate etc.