Search code examples
phpcurlfopenfile-get-contentssimple-html-dom

SimpleHTMLDom returns 500 error on printing output


While trying to print the output of the simplehtmldom it gives me 500 error. I tried followed methods but error was same.

  • Method 1

    $html = file_get_html("http://www.google.com");

    print_r($html);

    After reading responses to other questions, I checked if allow_url_fopen was working and it was.

  • Method 2

    $html = file_get_contents("http://www.google.com");

    print_r($html);

    This works but when I parse it with following code, again 500 error.

    $object = new simple_html_dom();

    $object->load($html);

    var_dump($object);

  • Method 3

    Then as last resort I thought I should try using curl and then parse. So I used curl and to make sure curl was working i printed the output at it was working. But when I parsed into the simplehtmldom again 500 error on printing the output.

[Sat Sep 08 21:26:19.456961 2018] [:error] [pid 703804] ModSecurity: Output filter: Response body too large (over limit of 404800001, total not specified).

I increased the limit almost a 100 times but still the same error.


Solution

  • The error message indicates ModSecurity is complaining about Response body being too large. This does not mean there is something wrong with loading HTML using Simple HTML DOM library, it is about the size of response generated by your code (print_r or var_dump parts). I guess this is because the structure of the HTML you're loading requires lots of nested objects to represent DOM tree, so when you try to output the full structure using print_r or var_dump the response becomes too large.

    You can verify that the HTML is loaded and parsed by simply printing the plain HTML of the page (use print instead of print_r to print simple_html_dom object):

    $html = file_get_html("http://www.google.com");
    
    print($html);
    

    and you will see the HTML is retrieved correctly, and you can work with $html object to manipulate DOM the way you expect to work with simple_html_dom objects.

    If you want to change the output limit for ModSecurity so you can generate larger responses, please have a look at this question: Mod Security response/request body size?