Search code examples
htmlrubynokogirihpricot

Disable error correction in Nokogiri


I'm working with a number of malformed HTML pages. At least, I presume they're malformed because when I parse them in Nokogiri and then execute to_html, elements don't appear correctly anymore. When I parse them with Hpricot, however, they display correctly.

I'd rather not use Hpricot because it appears to be impossible to add Hpricot::Elem instances to a document (without converting them to strings, adding, then parsing again).

Can I disable Nokogiri's error correction so that I can preserve the HTML closer to the way it was written?


Solution

  • Your XHTML is not valid XHTML. If I copy the contents from http://pastie.org/2638305, save them as 'foo.xhtml' and then attempt to open them in Chrome, I see:

    This page contains the following errors:
    error on line 768 at column 39: attributes construct error

    If I look on line 768 then I see (truncated):

    <img src="..." alt="Talk to us now!"http://wholesaleinsurance.net/>
    

    As you can see, that is clearly not syntactically valid.

    You claim that you ran the page through validator.w3.org, but when I do that with the contents of your pastie I get:

    Errors found while checking this document as XHTML 1.0 Strict!
    Result: 15 Errors, 3 warning(s)

    So...is your actual content not what you put in the pastie?