Search code examples
rubyweb-scrapingmechanize

Mechanize::ResponseCodeError (404 => Net::HTTPNotFound unhandled response):


Trying to scrape images from https://en.wikipedia.org/ website using mechanize gem. I am getting Mechanize::ResponseCodeError (404 => Net::HTTPNotFound for https://upload.wikimedia.org/wikipedia/commons/thumb/f/f5/FP2A3620_%252823497688248%2529.jpg/119px-FP2A3620_%252823497688248%2529.jpg -- unhandled response): for this when i try to calculate image size.

Here is my code

         def images
          agent = Mechanize.new
          page = agent.get("https://en.wikipedia.org/")
          page.images.each do |image|
            puts image.url
            size = agent.head( image )["content-length"].to_i/1000
          end  
       end

Any help is appreciated.


Solution

  • Looked after that image on wikipedia and it renders just fine. Opened it in a new tab and compared the url from the browser to what mechanize has.

    Unescaping the url, did the trick.

    image_url = CGI.unescape(image.url.to_s)
    size = agent.head(image_url)["content-length"].to_i/1000
    

    Here is a working Replit.