I'm currently building an application that will simply visit a website each day and save information on a particular table of that site to a database that I've set up. I've currently created a class method on my model to complete the scrape. A rake task that I created calls the class method once per day.
While my code 'works' and I collect the information once per day, I feel somewhat strange leaving the logic for the scraping in my model and am curious as to whether or not there's a preferable way to complete this task.
class WebTable < ApplicationRecord
def self.scrape_and_save_table_information
doc = Nokogiri::HTML(open('https://www.calottery.com/play/scratchers-games/top-prizes-remaining'))
rows = doc.css("tbody tr")
rows.each do |row|
row_object = {}
row_object["cell_one"] = row.children[1].children[0].to_s
row_object["cell_two"] = row.children[2].children[0].children.to_s
row_object["cell_three"] = row.children[7].children[0].children[0].to_s
@table = WebTable.create(row_object)
end
end
end
My rake task looks like this:
desc 'scraping webpages'
task scrape_web_pages: :environment do
daily_prize_scrape = WebTable.scrape_and_save_table_information
end
Sidekiq workers tend to work quite well (pun intended), and particularly in the case of loops you can spawn other workers off one main worker, for better performance and easier error catching
eg.
class HardWorker
include Sidekiq::Worker
def perform
['nice', 'rows'].each do |row|
OtherWorker.perform_async(row)
end
end
end