Search code examples
phpquerypath

Querypath and Malformed HTML


I'm using QueryPath to manipulate a pages DOM. The page I'm manipulating has some tags that QueryPath doesn't know how to interpret.

I've tried passing the following as options but I still get errors:

ignore_parser_warnings
use_parser (html)

I get the following errors with these enabled:

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Tag nobr invalid in Entity

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity

Any help would be greatly appreciated.


Solution

  • Try the libxml functions

    libxml_use_internal_errors(TRUE);
    $dom->load('whatever'); // or whatever you use for loading the DOM
    libxml_clear_errors();
    

    Instead of just clearing the erros, you can opt to handle them, though the above should be sufficient for most cases.