Search code examples
phpcurldomdocumentdomxpath

PHP cURL not showing a part of content from some point


I am struggling for a while to make this work but seems that I am missing something. The scenarios is this:
I am trying to get some informations from a website using PHP and cURL via DOMXpath query. I am getting any information till to a point and from that point and below i don't get anything...blank. The script that I am using is as below:

$target_url = "https[:]//[www][.]bankofalbania[.]org/Tregjet/Kursi_zyrtar_i_kembimit/"; //Remove [ and ] from url
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';

$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 1000);

$html= curl_exec($ch);
if (!$html) {
    echo "<br />cURL error number:" .curl_errno($ch);
    echo "<br />cURL error:" . curl_error($ch);
    exit;
}

// parse the html into a DOMDocument
$document = new DOMDocument();
libxml_use_internal_errors(true);
$document->loadHTML($html);
libxml_clear_errors();
$selector = new DOMXPath($document);

$anchors = $selector->query('/html/body/div[1]/section[1]/div/div[2]/div[2]/div[2]/div/table[1]/tbody/tr[1]/td[1]');
    foreach($anchors as $div) { 
        $value = $div->nodeValue;
        echo $value;
}

Intersting is that, if the $anchors is changed to this
$anchors = $selector->query('/html/body/div[1]/section[1]/div/div[2]/div[2]/div[2]/div/table[1]');
The content is extracted from the website. Also, I should mention that I have tried to change the query to something more direct, as below:

$anchors = $selector->query('//table[@class="table table-sm table-responsive w-100 d-block d-md-table table-bordered m-0"]/tbody/tr[1]/td[3]');

but the results are the same...null! I don't know what I am missing here but I can't make it run. What i am looking forward to get is the value of USD from the table of the page on $target_url.
Thank you in advance :-)


Solution

  • There's no tbody tags in the html, and unlike Javascript, PHP doesn't add it automatically (keep that in mind when you use the developper tools provided by your browser). Also the amount of USD is in the third cell, so the correct XPath query is:

    /html/body/div[1]/section[1]/div/div[2]/div[2]/div[2]/div/table[1]/tr[1]/td[3]