Search code examples
pythonpython-2.7rssfeedparserrss2

How do I access pubDate for RSS items using Python feedparser?


In this example RSS feed, the optional item element pubDate is included in all entries. But it is not available as a item element in the Python module feedparser. This code:

import feedparser
rss_object = feedparser.parse("http://cyber.law.harvard.edu/rss/examples/rss2sample.xml")
for entry in rss_object.entries:
    print entry.pubDate

Causes the error AttributeError: object has no attribute 'pubDate' but I can successfully do print entry.description and see the contents of all the description tags.


Solution

  • feedparser is an opinionated parser, not simply returning XML in a dictionary. The text of pubDate is available as entries[i].published.

    The date this entry was first published, as a string in the same format as it was published in the original feed.

    Working code:

    for entry in rss_object.entries:
        print entry.published
    

    Note: published is extracted from one of several possible XML tags depending on the format of the feed. See the reference manual for a list.

    This manual also claims the pubDate element is parsed "as a date" in entries[i].published_parsed. What's in published_parsed is a time.struct_time object; you may want to re-parse the date yourself to maintain time zone information, if the original feed included time zones.