Search code examples
ruby-on-railsrubyweb-scrapingapi-design

Scrape an entire API or rely heavily on it?


I'm building a resort review site in rails. Currently, a User has many reviews, and each Review belongs to a User.

The reviews table contains an expedia_id field. All data - the hotel name, images, description, etc. are pulled dynamically from the Expedia API, using lookups against this id. For example, when hitting the show action in a controller, it makes a request to Expedia and my db using the expedia_id to get all of the reviews and content, and renders everything on one page. Requests will also made to populate the home page (I'm thinking a Featured table with an expedia_id column)

Since my entire website relies heavily on an API and I don't have a Resort table, plus the fact that given a large amount of users, a lot of requests would be made to the Expedia API, would it make sense to scrape and write the results to my database, creating records for later use?


Solution

  • The middle ground would be the best solution. Create a table and model that would locally store the active resorts. Expiring your local copy after a certain period (determined by how frequently the resorts change on Expedia) and only pinging the api on a new to your system resort, or loading a resort that has been expired

    This would be a basic example of how this might be done

    class Resort < ApplicationRecord #for Rails <=4 do ActiveRecord::Base
      after_find :maybe_update_from_expedia
      ExpirationTime = 1.day #change to fit what is needed
    
      def self.find_by_expedia_id(expedia_id)
        result = self.where(expedia_id: expedia_id).first
        result || self.create_by_expedia_id(expedia_id)
      end
    
      def maybe_update_from_expedia
        update_from_expedia if expire_at.nil? || expire_at < Time.now
      end
    
      private
      def self.create_by_expedia_id(expedia_id)
        record = new(expedia_id: expedia_id)
        record.maybe_update_from_expedia
        record
      end
    
      def update_from_expedia
        #fetch record from expedia
        #update local data
        self.expire_time = Time.now + ExpirationTime
        self.save
      end
    end
    

    As suggested by engineersmnky this can be condensed to

    class Resort < ApplicationRecord #for Rails <=4 do ActiveRecord::Base
      after_initialize :maybe_update_from_expedia
      ExpirationTime = 1.day #change to fit what is needed
    
      private
    
      def maybe_update_from_expedia
        update_from_expedia if expire_at.nil? || expire_at < Time.now
      end
    
      def update_from_expedia
        #fetch record from expedia
        #update local data
        self.expire_time = Time.now + ExpirationTime
        self.save
      end
    end
    

    If all fetch requests use Resort.find_or_create_by(expedia_id: expedia_id)