I've a strange problem with this site and php->file_get_contents or php->curl or bash->wget.
If I try to download this page, I get a small file that contains only the string HNGJpP5b-452.
With normal browsers (chrome, konqueror and others, even in incognito mode, so this not depend on "login" problem), the page is correctly downloaded. The link is:
link = https://rutracker.net/forum/viewforum.php?f=1992
I've used this php code:
<?
$lnks = array("https://rutracker.net/forum/viewforum.php?f=1992", "https://example.com");
foreach($lnks as $lnk) {
echo "Working with url: ".$lnk."<br>\n";
echo "========================================================================<br>\n";
// file_get_contents part
$html=file_get_contents($lnk);
echo "file_get_contents get this: ".$html."<br>\n<br>\n";
// curl part
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $lnk);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$html = curl_exec($ch);
echo "curl get this: ".$html."<br>\n<br>\n";
}
?>
Result is:
Working with url: https://rutracker.net/forum/viewforum.php?f=1992
========================================================================
file_get_contents get this: HNGJpP5b-452
curl get this: HNGJpP5b-452
Working with url: https://example.com
========================================================================
file_get_contents get this:
Example Domain
This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.
More information...
curl get this:
Example Domain
This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.
More information...
It doesn't seem that this is due to the "user agent", for curl I tried to set the relative option CURLOPT_USERAGENT identical to chrome, without any change.
Same results for wget in bash.
Any ideas? Regards.
For whatever reason, this website returns that string when no Accept-Encoding
header is present on the request.
You can add an Accept-Encoding
header to file_get_contents()
using a stream context
$context = stream_context_create([
"http" => [
"header" => "Accept-Encoding: gzip,deflate,br\r\n"
]
]);
$content = file_get_contents($lnk, false, $context);
or to a curl request using
curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate,br');