Search code examples
phpcurl

PHP CURL would not retrieve table content


I am trying to scrawl table content from a webpage from canada.ca. It seems that the retrieved content is missing certain parts, as described in the following code:

<?php
$url="https://www.canada.ca/en/immigration-refugees-citizenship/corporate/mandate/policies-operational-instructions-agreements/ministerial-instructions/express-entry-rounds.html";
$base = $url;
$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $base);
curl_setopt($curl, CURLOPT_REFERER, $base);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36');

$str = curl_exec($curl);
curl_close($curl);

echo $str; // from what shown, we can decide table header ("#   Date    Round type  ...") exists, but other table rows are missing

It seems that "tr" inside "table/tbody" are lost. It must be something in the CURL parameters. What could it be?


Solution

  • You can use the .json url for importing the data:

    https://www.canada.ca/content/dam/ircc/documents/json/ee_rounds_123_en.json

    You can try this php code:

    <?php
    // The URL of the JSON file
    $json_url = 'https://www.canada.ca/content/dam/ircc/documents/json/ee_rounds_123_en.json';
    
    // Fetch the JSON data using file_get_contents
    $json_data = file_get_contents($json_url);
    
    // Check if the fetch was successful
    if ($json_data === false) {
        die('Error fetching JSON data');
    }
    
    // Decode the JSON data
    $decoded_data = json_decode($json_data, true);
    
    // Check if decoding was successful
    if ($decoded_data === null) {
        die('Error decoding JSON data');
    }
    
    // Use the decoded data
    print_r($decoded_data);
    ?>