I use tagsoup as (SAX) XMLREader
and set the namespace feature to false
. This parser is used to feed the Transformer
as SAX Source. Complete code:
final TransformerFactory factory = TransformerFactory.newInstance();
final Transformer t = factory.newTransformer(new StreamSource(
getClass().getResourceAsStream("/identity.xsl")));
final XMLReader p = new Parser(); // the tagsoup parser
p.setFeature("http://xml.org/sax/features/namespaces", false);
// getHtml() returns HTML as InputStream
final Source source = new SAXSource(p, new InputSource(getHtml()));
t.transform(source, new StreamResult(System.out));
This results in something like:
< xmlns:html="http://www.w3.org/1999/xhtml">
<>
<>
<>
<>
< height="17" valign="top">
Problem is that the tag names are blank. The XMLReader (tagsoup parser) does report an empty namespaceURI and empty local name in the SAX methods ContentHandler#startElement
and ContentHandler#endElement
. For a not namespace aware parser this is allowed (see Javadoc).
If i add a XMLFilter
which copies the value of the qName to the localName, everything goes fine. However, this is not what i want, i expect this works "out of the box". What am i doing wrong? Any input would be appreciated!
I expect this works "out of the box". What am i doing wrong?
What you are doing wrong is taking a technology (XSLT) that is defined to operate over namespace-well-formed XML and attempting to apply it to data that it is not intended to work with. If you want to use XSLT then you must enable namespaces, declare a prefix for the http://www.w3.org/1999/xhtml
namespace in your stylesheet, and use that prefix consistently in your XPath expressions.
If your transformer understands XSLT 2.0 (e.g. Saxon 9) then instead of declaring a prefix and prefixing your element names in XPath expressions, you can put xpath-default-namespace="http://www.w3.org/1999/xhtml"
on the xsl:stylesheet
element to make it treat unprefixed element names as references to that namespace. But in XSLT 1.0 (the default built-in Java Transformer
implementation) your only option is to use a prefix.