Search code examples
phpcurlsimple-html-dom

Call to a member function find() on null in PHP Simple HTML DOM


I intend to use PHP Simple HTML DOM To extract the links in this link

The code I wrote is as follows:

$url = "https://www.technolife.ir/product-3303";        
$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_REFERER, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$str = curl_exec($curl);
curl_close($curl);
$html_base = new simple_html_dom();

foreach($html_base->find('a') as $element) {
    echo "<pre>";
    print_r( $element->href );
    echo "</pre>";
}


But unfortunately I get this error while running:

Call to a member function find() on null

Solution

  • https://www.technolife.ir/product-3303 serves gzip-compressed content even when the client doesn't request compression, hence you just get a bunch of binary gzip-compressed data which looks like complete junk to simplehtmldom and causes it to crash.

    luckily libcurl has built-in support for decompressing gzip, which can be enabled with curl_setopt($curl, CURLOPT_ENCODING, '');

    that said, you should use DOMDocument over simple_html_dom,

    $html_base = new DOMDocument();
    @$html_base->loadHTML($str);
    foreach($html_base->getElementsByTagName('a') as $element) {
        echo "<pre>";
        print_r( $element->getAttribute("href") );
        echo "</pre>";
    }