I am trying to parse an RSS feed using feedparser.
I am getting the title like this:
import feedparser
url = 'http://chem.aalto.fi/en/current/events/rss.xml'
feed = feedparser.parse(url)
for entry in feed.entries:
title = entry.title
print title
Typically this works without problems, but I encountered a strange case. In this particular feed, the titles look like this:
<title>06.11.2015: Some title text</title>
As expected, I sometimes get:
06.11.2015: Some title text
... but sometimes also this for the same item:
11/06/15: Some title text
Has anybody experienced a similar problem? It seems to be completely random.
This appears to be a bug on the server side. I have not seen the feed before but I managed to see both date formats in apparently random manner when using the feed.
If your goal is to get a consistent date and title of the events, you could use the additional xcal
metadata in that feed. For example, by using dateutil
:
import feedparser
import dateutil.parser
url = 'http://chem.aalto.fi/en/current/events/rss.xml'
feed = feedparser.parse(url)
for entry in feed.entries:
title = entry.title.split(": ", 1)[1]
start_time = dateutil.parser.parse(entry.xcal_dtstart)
end_time = dateutil.parser.parse(entry.xcal_dtend)
print("{} - {}: {}".format(start_time.date(), end_time.date(), title))
EDIT: Also, for what it's worth, that RSS feed seems to consistently output titles in the 06/15/16
format when using http://chem.aalto.fi/en/current/events/rss.xml?format=rss and in the 15.06.2016
format when using http://chem.aalto.fi/en/current/events/rss.xml?format=atom for the request.
The code used to generate the feed (based on generator="FeedCreator 1.7.6(BH)"
on top of the feed) can be seen at https://github.com/ajslater/feedcreator/blob/master/include/feedcreator.class.php
Based on that, my guess is that the Feedcreator library has some unintended side effects on the main code that generates the entry title and that side effect seems to vary based on the feed format used. If the feed format is not set explicitly in the URL, then it might be (incorrectly) using some cached version of either the format or the entire feed content. Anyway, explicitly setting the format in the URL will most probably resolve this issue for you.