i am trying to get the exact URL of an image inside a page and then download it. i haven't yet gotten to the download point, as i am trying to isolate the URL of the image. here is the code:
#!/usr/bin/ruby -w
require 'rubygems'
require 'hpricot'
require 'open-uri'
raw = Hpricot(open("http://www.amazon.com/Weezer/dp/B000003TAW/"))
ele = raw.search("img[@src*=jpg]").first
img = ele.match("(\")(.*?)(\")").captures
puts img[1]
when i run it as it is, i receive:
undefined method `match' for #<Hpricot::Elem:0xb731948c> (NoMethodError)
if i comment out the last 2 lines and add
puts ele
i get:
<img src="http://ecx.images-amazon.com/images/I/51rpVNqXmYL._SL500_AA240_.jpg" style="display:none;" />
which is the correct portion of the page i want to parse. however, the error is when i try to get just the "http://ecx.images-amazon.com/images/I/51rpVNqXmYL._SL500_AA240_.jpg" style="display:none;" part.
i am not totally sure why it can't perform a match, as I understand the search i am running should be getting an array of the image elements and returning the first. so i assumed that i could not run the match on the entire array, so i tried
img = ele[1].match("(\")(.*?)(\")").captures
puts img
and that returns
undefined method `match' for nil:NilClass (NoMethodError)
i am lost. please excuse my ignorance, as i am just beginning to learn ruby. any help is appreciated.
Change this line:
img = ele.match("(\")(.*?)(\")").captures
To:
img = ele[:src]
The reason for the errors is that Hpricot:Elem
isn't a string. Try:
ele.responde.to? :match
and you get false.
However, you could do:
ele.to_s.match("(\")(.*?)(\")").captures[1]
the secret is in the to_s