I am trying to scrawl table content from a webpage from canada.ca. It seems that the retrieved content is missing certain parts, as described in the following code:
<?php
$url="https://www.canada.ca/en/immigration-refugees-citizenship/corporate/mandate/policies-operational-instructions-agreements/ministerial-instructions/express-entry-rounds.html";
$base = $url;
$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $base);
curl_setopt($curl, CURLOPT_REFERER, $base);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36');
$str = curl_exec($curl);
curl_close($curl);
echo $str; // from what shown, we can decide table header ("# Date Round type ...") exists, but other table rows are missing
It seems that "tr" inside "table/tbody" are lost. It must be something in the CURL parameters. What could it be?
You can use the .json url for importing the data:
https://www.canada.ca/content/dam/ircc/documents/json/ee_rounds_123_en.json
You can try this php code:
<?php
// The URL of the JSON file
$json_url = 'https://www.canada.ca/content/dam/ircc/documents/json/ee_rounds_123_en.json';
// Fetch the JSON data using file_get_contents
$json_data = file_get_contents($json_url);
// Check if the fetch was successful
if ($json_data === false) {
die('Error fetching JSON data');
}
// Decode the JSON data
$decoded_data = json_decode($json_data, true);
// Check if decoding was successful
if ($decoded_data === null) {
die('Error decoding JSON data');
}
// Use the decoded data
print_r($decoded_data);
?>