Search code examples
ruby-on-railsmodelrake

Architecture Question - Where to put the scrape task


I'm currently building an application that will simply visit a website each day and save information on a particular table of that site to a database that I've set up. I've currently created a class method on my model to complete the scrape. A rake task that I created calls the class method once per day.

While my code 'works' and I collect the information once per day, I feel somewhat strange leaving the logic for the scraping in my model and am curious as to whether or not there's a preferable way to complete this task.

class WebTable < ApplicationRecord


    def self.scrape_and_save_table_information
        doc = Nokogiri::HTML(open('https://www.calottery.com/play/scratchers-games/top-prizes-remaining'))
        rows = doc.css("tbody tr")
        rows.each do |row|
            row_object = {}
            row_object["cell_one"] = row.children[1].children[0].to_s
            row_object["cell_two"] = row.children[2].children[0].children.to_s
            row_object["cell_three"] = row.children[7].children[0].children[0].to_s
            @table = WebTable.create(row_object)

        end
    end

end

My rake task looks like this:

desc 'scraping webpages'
task scrape_web_pages: :environment do
    daily_prize_scrape = WebTable.scrape_and_save_table_information
end

Solution

  • Sidekiq workers tend to work quite well (pun intended), and particularly in the case of loops you can spawn other workers off one main worker, for better performance and easier error catching

    eg.

    class HardWorker
      include Sidekiq::Worker
    
      def perform
        ['nice', 'rows'].each do |row|
          OtherWorker.perform_async(row)
        end
      end
    end