Search code examples
javaxsltsaxtag-soup

How to do a XSL transform in Java using a not namespace aware parser?


I use tagsoup as (SAX) XMLREader and set the namespace feature to false. This parser is used to feed the Transformer as SAX Source. Complete code:

    final TransformerFactory factory = TransformerFactory.newInstance();
    final Transformer t = factory.newTransformer(new StreamSource(
        getClass().getResourceAsStream("/identity.xsl")));

    final XMLReader p = new Parser(); // the tagsoup parser
    p.setFeature("http://xml.org/sax/features/namespaces", false);

    // getHtml() returns HTML as InputStream
    final Source source = new SAXSource(p, new InputSource(getHtml())); 

    t.transform(source, new StreamResult(System.out));

This results in something like:

< xmlns:html="http://www.w3.org/1999/xhtml">
<>
<>
<>
<>
< height="17" valign="top">

Problem is that the tag names are blank. The XMLReader (tagsoup parser) does report an empty namespaceURI and empty local name in the SAX methods ContentHandler#startElement and ContentHandler#endElement. For a not namespace aware parser this is allowed (see Javadoc).

If i add a XMLFilter which copies the value of the qName to the localName, everything goes fine. However, this is not what i want, i expect this works "out of the box". What am i doing wrong? Any input would be appreciated!


Solution

  • I expect this works "out of the box". What am i doing wrong?

    What you are doing wrong is taking a technology (XSLT) that is defined to operate over namespace-well-formed XML and attempting to apply it to data that it is not intended to work with. If you want to use XSLT then you must enable namespaces, declare a prefix for the http://www.w3.org/1999/xhtml namespace in your stylesheet, and use that prefix consistently in your XPath expressions.

    If your transformer understands XSLT 2.0 (e.g. Saxon 9) then instead of declaring a prefix and prefixing your element names in XPath expressions, you can put xpath-default-namespace="http://www.w3.org/1999/xhtml" on the xsl:stylesheet element to make it treat unprefixed element names as references to that namespace. But in XSLT 1.0 (the default built-in Java Transformer implementation) your only option is to use a prefix.