Search code examples
javaxmlstringdocumentw3c

Can't convert xml string to w3c doc


I want to convert an java string containing xml to a w3c dom document object.

I first searched all over the place and came up with some good examples here on stackoverflow. But sadly I can get them working!

Apperently my code is not working 100%.

It seems like it parses the string but there are no values in the nodes. This is what I got so far!

Document newDoc = null;

InputSource is = new InputSource();
is.setCharacterStream(new StringReader(TestFiles.RSS_FEED_FILE_2));

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = null;
builder = factory.newDocumentBuilder();
newDoc = builder.parse(is);

When I do a sysout afterwards like this:

System.out.println(newDoc.getDocumentElement().getElementsByTagName("channel").item(0)
.getNodeValue());

I got null as output while using this sysout:

System.out.println(newDoc.getDocumentElement().getElementsByTagName("channel").item(0));

I got as output: [channel: null]

So I have an object else it would throw some null pointer exceptions but it doesn't contain any values inside ?!

The content of the constant is this :

public final static String RSS_FEED_FILE_2 =    "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" + 
                                            "<rss version=\"2.0\">\n" + 
                                            "<channel>\n" + 
                                            "<title>sunday</title>\n" + 
                                            "<link>http://www.google.nl</link>\n" + 
                                            "<pubDate>2012-02-05 20:58</pubDate>\n" + 
                                            "<lastBuildDate>2012-02-08 09:48</lastBuildDate>\n" + 
                                            "<description>blabla </description>\n" + 
                                            "<item>\n" + 
                                            "<title><![CDATA[title]]></title>\n" + 
                                            "<link><![CDATA[http://www.google.nl]]></link>\n" + 
                                            "<guid><![CDATA[2266610]]></guid>\n" + 
                                            "<source><![CDATA[sunday]]></source>\n" + 
                                            "<author><![CDATA[me]]></author>\n" + 
                                            "<description><![CDATA[blalbalavblabllllll!]]></description>\n" + 
                                            "</item>\n" + 
                                            "</channel>\n" + 
                                            "</rss>";

Does anybody have a solution or a hint?


Solution

  • This is quite a common gotcha. The behaviour of getNodeValue() depends on the subclass of Node. In the case of an Element, getNodeValue() will always return null (see the table in the Node javadoc for behaviour of other subclasses).

    Consider using getTextContent() if you want to debug the XML document.