Search code examples
pythonxmlrssminidom

All nodeValue fields are None when parsing XML


I'm building a simple web-based RSS reader in Python, but I'm having trouble parsing the XML. I started out by trying some stuff in the Python command line.

>>> from xml.dom import minidom
>>> import urllib2 
>>> url ='http://www.digg.com/rss/index.xml'
>>> xmldoc = minidom.parse(urllib2.urlopen(url))
>>> channelnode = xmldoc.getElementsByTagName("channel")
>>> channelnode = xmldoc.getElementsByTagName("channel")
>>> titlenode = channelnode[0].getElementsByTagName("title")
>>> print titlenode[0]
<DOM Element: title at 0xb37440> 
>>> print titlenode[0].nodeValue 
None

I played around with this for a while, but the nodeValue of everything seems to be None. Yet if you look at the XML, there definitely are values there. What am I doing wrong?


Solution

  • For RSS feeds you should try the Universal Feed Parser library. It simplifies the handling of RSS feeds immensly.

    import feedparser
    d = feedparser.parse('http://www.digg.com/rss/index.xml')
    title = d.channel.title