I'm trying to run the sample from the sinew source code, but it's not working on my machine. Here is the sample (taken directly from their github):
get "http://www.amazon.com/gp/bestsellers/books/ref=sv_b_3"
noko.css(".zg_itemRow").each do |item|
row = { }
row[:url] = item.css(".zg_title a").first[:href]
row[:title] = item.css(".zg_title")
row[:img] = item.css(".zg_itemImage_normal img").first[:src]
csv_emit(row)
end
I'm using ubuntu 12.04 with ruby 1.9.3 and rvm. Here is what I typed in, followed by the error.
jefferton@ubuntu:~/IdeaProjects/sinew_scrape$ sinew sell_list.sinew
curl http://www.amazon.com/gp/bestsellers/books/ref=sv_b_3
/home/jefferton/.rvm/gems/ruby-1.9.3-head/gems/sinew-1.0.2/lib/sinew/text_util.rb:48:in `popen': No such file or directory - tidy -asxml -bare -quiet -utf8 -wrap 0 --doctype omit --hide-comments yes --force-output yes -f /dev/null (Errno::ENOENT)
from /home/jefferton/.rvm/gems/ruby-1.9.3-head/gems/sinew-1.0.2/lib/sinew/text_util.rb:48:in `html_tidy'
from /home/jefferton/.rvm/gems/ruby-1.9.3-head/gems/sinew-1.0.2/lib/sinew/main.rb:33:in `html'
from /home/jefferton/.rvm/gems/ruby-1.9.3-head/gems/sinew-1.0.2/lib/sinew/main.rb:59:in `noko'
from sell_list.sinew:9:in `_run'
from /home/jefferton/.rvm/gems/ruby-1.9.3-head/gems/sinew-1.0.2/lib/sinew/main.rb:121:in `instance_eval'
from /home/jefferton/.rvm/gems/ruby-1.9.3-head/gems/sinew-1.0.2/lib/sinew/main.rb:121:in `_run'
from /home/jefferton/.rvm/gems/ruby-1.9.3-head/gems/sinew-1.0.2/lib/sinew/main.rb:16:in `initialize'
from /home/jefferton/.rvm/gems/ruby-1.9.3-head/gems/sinew-1.0.2/bin/sinew:19:in `new'
from /home/jefferton/.rvm/gems/ruby-1.9.3-head/gems/sinew-1.0.2/bin/sinew:19:in `block in <top (required)>'
from /home/jefferton/.rvm/gems/ruby-1.9.3-head/gems/sinew-1.0.2/bin/sinew:18:in `each'
from /home/jefferton/.rvm/gems/ruby-1.9.3-head/gems/sinew-1.0.2/bin/sinew:18:in `<top (required)>'
from /home/jefferton/.rvm/gems/ruby-1.9.3-head/bin/sinew:19:in `load'
from /home/jefferton/.rvm/gems/ruby-1.9.3-head/bin/sinew:19:in `<main>'
from /home/jefferton/.rvm/gems/ruby-1.9.3-head/bin/ruby_noexec_wrapper:14:in `eval'
from /home/jefferton/.rvm/gems/ruby-1.9.3-head/bin/ruby_noexec_wrapper:14:in `<main>'
I wish I knew a more specific thing to ask, but I'm not sure what to do here.
Thanks.
That library might be worth looking into but I can't imagine why they would use curl over mechanize or what html tidy is supposed to be for. And shelling out to executables like that is just a bad approach. My opinion is to avoid it and use mechanize instead.