Search code examples
phplibxml2

libxml htmlParseDocument ignoring htmlParseOption flags


Looking for someone who uses libxml through an environment other than packaged with PHP to confirm the HTML_PARSE_NOWARNING flag is ignored. Warnings are still generated.

Source code from PHP, implementing libxml in C:

//one of these options is 64 or HTML_PARSE_NOWARNING
htmlCtxtUseOptions(ctxt, (int)options);

ctxt->vctxt.error = php_libxml_ctx_error;
ctxt->vctxt.warning = php_libxml_ctx_warning;
if (ctxt->sax != NULL) {
    ctxt->sax->error = php_libxml_ctx_error;
    ctxt->sax->warning = php_libxml_ctx_warning;
}
htmlParseDocument(ctxt); //this still produces warnings

Solution

  • libxml2 does not ignore the HTML_PARSE_NOWARNING flag. Calling htmlCtxtUseOptions with HTML_PARSE_NOWARNING causes the warning handlers to be unregistered (set to NULL). But the PHP code then proceeds to install its own handlers unconditionally, rendering the flag useless. The PHP code should either add a check whether to install the handlers:

    htmlCtxtUseOptions(ctxt, (int)options);
    
    if (!(options & HTML_PARSE_NOERROR)) {
        ctxt->vctxt.error = php_libxml_ctx_error;
        if (ctxt->sax != NULL)
            ctxt->sax->error = php_libxml_ctx_error;
    }
    if (!(options & HTML_PARSE_NOWARNING)) {
        ctxt->vctxt.warning = php_libxml_ctx_warning;
        if (ctxt->sax != NULL)
            ctxt->sax->warning = php_libxml_ctx_warning;
    }
    htmlParseDocument(ctxt);
    

    Or call htmlCtxtUseOptions after setting the handlers:

    ctxt->vctxt.error = php_libxml_ctx_error;
    ctxt->vctxt.warning = php_libxml_ctx_warning;
    if (ctxt->sax != NULL) {
        ctxt->sax->error = php_libxml_ctx_error;
        ctxt->sax->warning = php_libxml_ctx_warning;
    }
    
    htmlCtxtUseOptions(ctxt, (int)options);
    htmlParseDocument(ctxt);