Search code examples
ruby-on-railssidekiqrake-task

Use sidekiq with a running dynamic counter in Rails


I build a website-crawler that (later on) uses these links to read out information.

The current rake-task goes through all the possible pages one by one and checks if the requests goes trough (valid response) or returns a 404/503 error (invalid page). If it's valid the pages url gets saved into my database. Now as you can see the task requests 50,000 pages in total thus requires some time.

I have read about Sidekiq and how it can perform these tasks asynchronously thus making this a lot faster.

My question: As you can see my task builds the counter after each loop. This will not work with Sidekiq I guess as it will only perform this script independent of itself various times, am I right?

How would I go around the problem of each instance needing its own counter then?

Hopefully my question makes sense - Thank you very much!

desc "Validate Pages"
task validate_url: :environment do
  require 'rubygems'
  require 'open-uri'
  require 'nokogiri'

  counter = 1
  base_url = "http://example.net/file"
  until counter > 50000 do
    begin
      url = "#{base_url}_#{counter}/"

      open(url)


      page = Page.new
      page.url = url
      page.save!

      puts "Saved #{url} !"

      counter += 1

    rescue OpenURI::HTTPError => ex
      logger ||= Logger.new("validations.log")
      if ex.io.status[0] == "503"
        logger.info "#{ex} @ #{counter}"
      end

      puts "#{ex} @ #{counter}"
      counter += 1

    rescue SocketError => ex
      logger ||= Logger.new("validations.log")
      logger.info "#{ex} @ #{counter}"

      puts "#{ex} @ #{counter}"

      counter += 1
    end
  end
end

Solution

  • A simple Redis INCR operation will create and/or increment a global counter for your jobs to use. You can use Sidekiq's redis connection to implement a counter trivially:

    Sidekiq.redis do |conn|
      conn.incr("my-counter")
    end