Search code examples
phpweb-scrapingsimple-html-dom

Trying to scrape kickasstorrents with simple html dom


I am trying to scrape kickasstorrents with simple html dom, but I am getting an error and I haven't even started yet. I followed some simple html tutorials and I have set up my url and using curl.

Code is as follows:

<?php
require('inc/config.php');
include_once('inc/simple_html_dom.php');

function scrap_kat() {

// initialize curl
$html = 'http://katcr.to/new/';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $html);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
$ip=rand(0,255).'.'.rand(0,255).'.'.rand(0,255).'.'.rand(0,255);
curl_setopt($ch, CURLOPT_HTTPHEADER, array("REMOTE_ADDR: $ip", "HTTP_X_FORWARDED_FOR: $ip"));
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/".rand(3,5).".".rand(0,3)." (Windows NT ".rand(3,5).".".rand(0,2)."; rv:2.0.1) Gecko/20100101 Firefox/".rand(3,5).".0.1");
$html2 = curl_exec($ch);
if($html2 === false)
{
    echo 'Curl error: ' . curl_error($ch);
}
else
{
    // create HTML DOM
    $kat = file_get_contents($html);
}
curl_close($ch);

// scripting starts




// clean up memory
$kat->clear();
unset($kat);
// return information
return $ret;

}
$ret = scrap_kat();
echo $ret;
?>

I receive the errors

Fatal error: Call to a member function clear() on resource in C:\wamp64\www\index.php on line 36

What do I do wrong? Thanks.


Solution

  • Simple_html_dom is a class. In that class there may be a function call, clear or it is in Simple_html_dom_node class. But In simple html dom, you need to use simple_html_dom class.

    @Hassaan, is correct. file_get_contents is a native php function, you have to create an object of simple_html_dom class. Like,

    $html = new simple_html_dom();
    

    And use this below code.

    function scrap_kat() {
    $url = 'http://katcr.to/new/';
    // $timeout= 120;
    # create object
    $html = new simple_html_dom();
    #### CURL BLOCK ####
    $curl = curl_init();
    curl_setopt($curl, CURLOPT_URL, $url);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
    curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/".rand(3,5).".".rand(0,3)." (Windows NT ".rand(3,5).".".rand(0,2)."; rv:2.0.1) Gecko/20100101 Firefox/".rand(3,5).".0.1");
    //curl_setopt($curl, CURLOPT_TIMEOUT, $timeout);
    $ip=rand(0,255).'.'.rand(0,255).'.'.rand(0,255).'.'.rand(0,255);
    curl_setopt($curl, CURLOPT_HTTPHEADER, array("REMOTE_ADDR: $ip", "HTTP_X_FORWARDED_FOR: $ip"));
    $content = curl_exec($curl);
    curl_close($curl);
    # note the variable change.
    # load the curl string into the object.
    $html->load($content);
    //echo $ip;
    #### END CURL BLOCK ####
    print_r($html->find('a'));
    // clean up memory
    $html->clear();
    unset($html);
    }
    scrap_kat();
    

    Well, their are a lot of errors in your code, so I am just telling you how you can do this. If explanation needed, please comment below this answer. I will.