Some friends and I have been working on a set of scripts that make it easier to do work on the machines at uni. One of these tools currently uses Nokogiri, but in order for these tools to run on all machines with as little setup as possible we've been trying to find a 'native' html parser, instead of requiring users to install RVM and custom gems (due to disk space limitations for most users).
Are we pretty much restricted to Nokogiri/Hpricot/? Should we look at just writing our own custom parser that fits our needs?
Cheers.
EDIT: If there's posts on here that I've missed in my searches, let me know! S.O. is sometimes just too large to find things effectively...
There is no html parser in ruby stdlib
html parsers have to be more forgiving of bad markup than xml parsers
You could run the html though tidy (http://tidy.sourceforge.net)
to tidy up the html and produce valid markup
This can now be read via rexml :-) which is in stdlib
rexml is much slower than nokogiri, last checked in 2009
Sam Ruby had been working on making rexml faster though
A better way would be to have a better deployment
Take a look at http://gembundler.com/bundle_package.html and using capistrano (or some such) to provision servers