What I've done so far..
sudo gem install scrapi
sudo gem install tidy
This didn't work because it didn't have the libtidy.dylib
So I did this :
sudo port install tidy
sudo cp libtidy.dylib /Library/Ruby/Gems/1.8/gems/scrapi-1.2.0/lib/tidy/libtidy.dylib
Then I started following the simple railscast at : http://media.railscasts.com/videos/173_screen_scraping_with_scrapi.mov
Right after Mr. Bates finished the first save for scrapitest.rb
, I tried to run this code :
require 'rubygems'
require 'scrapi'
scraper = Scraper.define do
process "title", :page_name => :text
result :page_name
end
uri = URI.parse("http://www.walmart.com/search/search-ng.do?search_query=lost+season+3&ic=48_0&search_constraint=0")
p scraper.scrape(uri)
With this code :
ruby scrapitest.rb
And it returned this error :
/Library/Ruby/Gems/1.8/gems/tidy-1.1.2/lib/tidy/tidybuf.rb:39: [BUG] Segmentation fault
ruby 1.8.7 (2009-06-12 patchlevel 174) [universal-darwin10.0]
Abort trap
Completely out of ideas..
I had this issue and then a follow-up issue where a seg fault would happen non-deterministically.
I followed the steps here - http://rubyforge.org/tracker/index.php?func=detail&aid=10007&group_id=435&atid=1744
In tidy-1.1.2/lib/tidy/tidylib.rb:
1. Add this line to the 'load' method in Tidylib:
extern "void tidyBufInit(void*)"
2. Define a new method called 'buf_init' in Tidylib:
# tidyBufInit, using default allocator
#
def buf_init(buf)
tidyBufInit(buf)
end
Then, in tidy-1.1.2/lib/tidy/tidybuf.rb:
3. Add this line to the initialize method of Tidybuf below the malloc:
Tidylib.buf_init(@struct)
so that is looks like this:
# tidyBufInit, using default allocator
#
def buf_init(buf)
@struct = TidyBuffer.malloc
Tidylib.buf_init(@struct)
end
4. For completeness, make Brennan's change by adding the allocator field to the TidyBuffer struct so that it looks like this:
TidyBuffer = struct [
"TidyAllocator* allocator",
"byte* bp",
"uint size",
"uint allocated",
"uint next"
]