Search code examples
rubyrssfeedparser

Checking for updated RSS feeds with Feedzirra


I am using Feedzirra to parse my RSS feeds and it works very well -- it is twice as fast Feed Normalizer in my initial testing. More importantly, it has nice wrappers that check for updated entries inside a feed. When I was using its feed-update approach, I ran into some issues:

require 'feedzirra'

feed = Feedzirra::Feed.fetch_and_parse("http://feeds.feedburner.com/TechCrunch")
puts feed.etag #outputs the right tag 

The above code prints the correct ETag (checked with Firebug). Now, when I want to check for updates, feedzirra asks you for current etags, last-modified date, etc. When I give it the right ETag, it says there are no updates - that's great. However, if I don't specify an ETag, it does not grab the latest ETag after it grabs all the feeds. That's an issue because if an update happens and I have a stale ETag, I will never be able to grab the current ETag short of calling fetch_and_parse - a waste of another fetch.

feed_to_update = Feedzirra::Parser::Atom.new
feed_to_update.feed_url = "http://feeds.feedburner.com/TechCrunch"
feed_to_update.etag = nil
feed_to_update.last_modified = nil

last_entry = Feedzirra::Parser::AtomEntry.new
last_entry.url = nil 

feed_to_update.entries = [last_entry]

updated_feed = Feedzirra::Feed.update(feed_to_update)

puts updated_feed.updated?
puts updated_feed.etag

The above example is a modified version that is part of the documentation from the author: http://gist.github.com/132671. I also tried to give a previous ETag value and it does not get updated - I chose to use nil in the above code because the ETags change frequently for Techcrunch.

The output I get is:

true    

#note the above line is blank (basically printing nil)

Am I doing something wrong and using the functions incorrectly in any way? or is this a bug with the program? Any other suggestions on how to look for updated feeds?

Btw, I also tried just using the 'last-modified-date' value and it always thinks there are new entries even if the date matches with the header response.

Thanks, -e

update: In the output I had incorrectly typed in 25 above the blank line. I have fixed that now. sorry.


Solution

  • I looked at the source code and found that etag was not being properly updated. So this seems to fix it:

    After the line below (in add_feed_to_multi() of feed.rb)

    feed.update_from_feed(updated_feed) 
    

    I added this line:

    feed.etag = updated_feed.etag 
    

    I still have not found a way to resolve last_modified issues but for now etags are working.