Search code examples
rubyweb-scrapingopen-uri

Ruby : cannot open a link that works in a web browser


I am trying to get a picture from the site mangafox, the picture is displayed in the navigators but I keep getting errors whit Ruby

So far, I have tried this :

require 'open-uri'
require 'pp'

def get_page(link)
  page = nil
  begin
    page = open(link, 'User-Agent' => "Ruby/#{RUBY_VERSION}")
  rescue Exception => e
    puts e.class.to_s
    puts e.message
  end
  return page
end

link = 'http://h.mfcdn.net/store/manga/9/14-116.0/compressed/Bleach-14-116[manga-rain]._manga_rain_bleach_ch116_01.jpg?token=24530ad3411b28ed7f5ef17f932e8713&ttl=1494853200'
# tried this after researching on internet because some characters are refused in links ( such as '[' or ']' )
link2 = link.gsub(/[\[\]]/) { '%%%s' % $&.ord.to_s(16) }.chomp

pp get_page(link)
pp get_page(link2)

but I get this output:

URI::InvalidURIError
bad URI(is not URI?): http://h.mfcdn.net/store/manga/9/14-116.0/compressed/Bleach-14-116[manga-rain]._manga_rain_bleach_ch116_01.jpg?token=24530ad3411b28ed7f5ef17f932e8713&ttl=1494853200
nil
OpenURI::HTTPError
403 Forbidden
nil


Solution

  • Using OpenURI is fine in a pinch, but you'd be better served by a more robust networking library like Net::HTTP or Typhoeus:

    response = Typhoeus.get('http://h.mfcdn.net/store/manga/9/14-116.0/compressed/Bleach-14-116[manga-rain]._manga_rain_bleach_ch116_01.jpg?token=24530ad3411b28ed7f5ef17f932e8713&ttl=1494853200')
    response.body #=> binary image data
    

    (Note: tested this before sharing — it loads fine)