Search code examples
ruby-on-railsrubyrssatom-feedfeedzirra

For Feedzirra, should I be using ID/GUID or Etag/Last modified


I am very frustrated in determining how to implement Feedzirra. I have two options working. Which one should I implement?

I have the Railscast #168 Feed Parsing example working. It uses entry.id based on ID, GUID, or URL depending upon which is available. (OBTW... I upgraded this from Rails 2 to Rails 4. It works except for the test scenarios. There is still work to do on it...)

I have the github sample version for Feedzirra operational. It is based on Etag and last modified date.

These two options seem to be diametrically opposed? Or, are they simply two options which should be selected depending upon the feed? I just don't understand. The documentation, which seems to be dated, is argumentative.

Which is current? Are they both current? Why would I select one or the other? Is one simply better or do I have to use one or the other depending on the feed I am processing?

I hate to ask whether Feedzirra is the right solution for bringing down many, many feeds that are updated often in a high performance environment or not. I do believe it is the right answer?

I just need to focus on the final solution, whatsoever that may be at this point.


Solution

  • As a general answer, independent of Feedzirra: they are separate and serve different purposes. ID/GUID are per-item properties defined by the RSS and Atom specifications to identify feed items across fetches. When you fetch a feed again you can track which items you already received the previous time (e.g. Are RSS guids actually expected to be _globally_ unique?).

    Etag/Last Modified are provided when you request a feed in order to identify the version of the feed that you already have and to avoid retrieving an unchanged copy (e.g. What is the point of If-Unmodified-Since/If-Modified-Since? Aren't they superseded by ETags?).

    You should use both.