Search code examples
pythonparsingrssfeedparser

RSS/Python - Parsing Single Image URL


I'm in the works of learning to parse xml and rss feeds correctly and have run in to a little problem. I'm using feedbarser in python to parse a specific entry from an RSS feed, but can't figure out how to parse just a single img src from the content section.

Here's what I have so far.

import dirFeedparser.feedparser as feedparser

feedurl = feedparser.parse('http://dustinheroin.chompblog.com/index.php?cat=22&feed=rss2')
statusupdate = feedurl.entries[0].content

print statusupdate

Now, when I print the content I get this:

[{'base': u'http://dustinheroin.chompblog.com/index.php?cat=22&feed=rss2', 'type': u'text/html', 'value': u'<p><a href="http://dustinheroin.chompblog.com/wp-content/uploads/2012/01/20120129-154945.jpg"><img alt="20120129-154945.jpg" class="alignnone size-full" src="http://dustinheroin.chompblog.com/wp-content/uploads/2012/01/20120129-154945.jpg" /></a></p>', 'language': None}]

What method would be best to get the IMG SRC from that? Any help is appreciated, thanks!


Solution

  • @Lattyware, you have some problem with setting soap.

    @user1130601, you can check the following code:

    #!/usr/bin/python
    
    from BeautifulSoup import BeautifulSoup
    import feedparser
    
    feedurl = feedparser.parse('http://dustinheroin.chompblog.com/index.php?cat=22&feed=rss2')
    statusupdate = feedurl.entries[0].content
    
    
    soup = BeautifulSoup(statusupdate[0]['value'])
    print(soup.find("img")["src"])
    

    Output:

    http://dustinheroin.chompblog.com/wp-content/uploads/2012/01/20120129-171134.jpg