I'm working with a number of malformed HTML pages. At least, I presume they're malformed because when I parse them in Nokogiri and then execute to_html, elements don't appear correctly anymore. When I parse them with Hpricot, however, they display correctly.
I'd rather not use Hpricot because it appears to be impossible to add Hpricot::Elem instances to a document (without converting them to strings, adding, then parsing again).
Can I disable Nokogiri's error correction so that I can preserve the HTML closer to the way it was written?
Your XHTML is not valid XHTML. If I copy the contents from http://pastie.org/2638305, save them as 'foo.xhtml' and then attempt to open them in Chrome, I see:
This page contains the following errors:
error on line 768 at column 39: attributes construct error
If I look on line 768 then I see (truncated):
<img src="..." alt="Talk to us now!"http://wholesaleinsurance.net/>
As you can see, that is clearly not syntactically valid.
You claim that you ran the page through validator.w3.org, but when I do that with the contents of your pastie I get:
Errors found while checking this document as XHTML 1.0 Strict!
Result: 15 Errors, 3 warning(s)
So...is your actual content not what you put in the pastie?