Search code examples
python-3.xparsingrssrss-reader

Python 3.7, Feedparser module cannot parse BBC weather feed


When I parse the example rss link provided by BBC weather it gives only an empty feed, the example link is: "https://weather-broker-cdn.api.bbci.co.uk/en/forecast/rss/3day/2643123"

Ive tried using the feedparser module in python, I would like to do this in either python or c++ but python seemed easier. Ive also tried rewriting the URL without https:// and with .xml and it still doesn't work.

import feedparser
d = feedparser.parse('https://weather-broker-cdn.api.bbci.co.uk/en/forecast/rss/3day/2643123')
print(d)

Should give a result similar to the RSS feed which is on the link, but it just gets an empty feed


Solution

  • First, I know you got no result - not an error like me. Perhaps you are running a different version. As I mentioned, it yields a result on an older version in Python 2, using a program that has been running solidly every night for about 5 years, but it throws an exception on a freshly installed feedparser 5.2.1 on Python 3.7.4 64 bit.

    I'm not entirely sure what is going on, but the function called _gen_georss_coords which is throwing a StopIteration on the first call. I have noted some references to this error due to the implementation of PEP479. It is written as a generator, but for your rss feed it only has to return 1 tuple. Here is the offending function.

    def _gen_georss_coords(value, swap=True, dims=2):
        # A generator of (lon, lat) pairs from a string of encoded GeoRSS
        # coordinates. Converts to floats and swaps order.
        latlons = map(float, value.strip().replace(',', ' ').split())
        nxt = latlons.__next__
        while True:
            t = [nxt(), nxt()][::swap and -1 or 1]
            if dims == 3:
                t.append(nxt())
            yield tuple(t)
    

    There is something curious going on, perhaps to do with PEP479 and the fact that there are two separate generators happening in the same function, that is causing StopIteration to bubble up to the calling function. Anyway, I rewrote it is a somewhat more straightforward way.

    def _gen_georss_coords(value, swap=True, dims=2):
        # A generator of (lon, lat) pairs from a string of encoded GeoRSS
        # coordinates. Converts to floats and swaps order.
        latlons = list(map(float, value.strip().replace(',', ' ').split()))
        for i in range(0, len(latlons), 3):
            t = [latlons[i], latlons[i+1]][::swap and -1 or 1]
            if dims == 3:
                t.append(latlons[i+2])
            yield tuple(t)
    

    You can define the above new function in your code, then execute the following to patch it into feedparser

    saveit, feedparser._gen_georss_coords = (feedparser._gen_georss_coords, _gen_georss_coords)
    

    Once you're done with it, you can restore feedparser to its previous state with

    feedparser._gen_georss_coords, _gen_georss_coords = (saveit, feedparser._gen_georss_coords)
    

    Or if you're confident that this is solid, you can modify feedparser itself. Anyway I did this trick and your rss feed suddenly started working. Perhaps in your case it will also result in some improvement.