Search code examples
rubymechanizemechanize-ruby

Ruby Mechanize Stops Working while in Each Do Loop


I am using a mechanize Ruby script to loop through about 1,000 records in a tab delimited file. Everything works as expected until i reach about 300 records.

Once I get to about 300 records, my script keeps calling rescue on every attempt and eventually stops working. I thought it was because I had not properly set max_history, but that doesn't seem to be making a difference.

Here is the error message that I start getting:

getaddrinfo: nodename nor servname provided, or not known

Any ideas on what I might be doing wrong here?

require 'mechanize' 
result_counter = 0
used_file = File.open(ARGV[0])
total_rows = used_file.readlines.size

mechanize = Mechanize.new { |agent|
  agent.open_timeout   = 10
  agent.read_timeout   = 10
  agent.max_history = 0
}

File.open(ARGV[0]).each do |line|
  item = line.split("\t").map {|item| item.strip}
  website = item[16]
  name = item[11]

  if website
    begin
      tries ||= 3
      page = mechanize.get(website)

      primary1 = page.link_with(text: 'text')
      secondary1 = page.link_with(text: 'other_text')
      contains_primary = true
      contains_secondary = true

      unless contains_primary || contains_secondary
        1.times do |count|
          result_counter+=1
          STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name} - No"
        end
      end

      for i in [primary1]
        if i
          page_to_visit = i.click
          page_found = page_to_visit.uri
          1.times do |count|
            result_counter+=1
            STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name}"
          end
          break
        end
      end
    rescue Timeout::Error
      STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name} - Timeout"
    rescue => e
      STDERR.puts e.message
      STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name} - Rescue"
    end
  end
end

Solution

  • You get this error because you don't close the connection after you used it.

    This should fix your problem:

    mechanize = Mechanize.new { |agent|
      agent.open_timeout = 10
      agent.read_timeout = 10
      agent.max_history  = 0
      agent.keep_alive   = false
    }