I am trying to scrape all the titles off of this RSS Feed:
http://www.quora.com/Python-programming-language-1/rss
This is my code for the same:
import urllib2
import re
content = urllib2.urlopen('http://www.quora.com/Python-programming-language-1/rss').read()
allTitles = re.compile('<title>(.*)</title>')
list = re.findall(allTitles,content)
for e in range(0, 2):
print list[e]
However, instead of getting a list of titles as the output, I am getting a bunch of code from the rss source. What am I doing wrong?
You should use non-greedy mark (?) in expression:
#allTitles = re.compile('<title>(.*)</title>')
allTitles = re.compile('<title>(.*?)</title>')
Without ?
all text except last </title>
placed in (.*) group...