I was trying to get string from website but i am getting 404 page of external website instead of index page string.
I have tried with both cURL
and file_get_contents
. Both returning 404 from external website instead of returning the string of index page.
$homepage = file_get_contents("https://www.creditkarma.ca");
echo $homepage;
cURL :
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_VERBOSE, true);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$homepage = file_get_contents_curl("https://www.creditkarma.ca");
echo $homepage;
The code should return the string of index page but it return the 404 page from external website. How can i solve this. i need a string of index page.
Note : it returning 404 of external website not from my .htaccess
With a CURL statement, if you want to retrieve the HTML of a page, you should be using headers
. As a security precaution, a lot of websites will deny traffic (or result in 404) if browser information is not apparent. So when I do this .. I try to "emulate" my statement, as if it were a browser. Something like this should fit the bill -- As noted in your updated code above, you are not denoting an "agent":
$url="https://www.creditkarma.ca";
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL,$url);
$result=curl_exec($ch);
var_dump($result);
UPDATE
I have tested this as a "stand alone" php script .. And get the following results:
* Trying 104.100.143.79:443...
* TCP_NODELAY set
* Connected to www.creditkarma.ca (104.100.143.79) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
CApath: /etc/ssl/certs
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: businessCategory=Private Organization; jurisdictionC=US; jurisdictionST=Delaware; serialNumber=4313894; C=US; ST=California; L=San Francisco; O=Credit Karma Inc.; CN=www.creditkarma.ca
* start date: Mar 16 00:00:00 2020 GMT
* expire date: Mar 21 12:00:00 2022 GMT
* subjectAltName: host "www.creditkarma.ca" matched cert's "www.creditkarma.ca"
* issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert SHA2 Extended Validation Server CA
* SSL certificate verify ok.
> GET / HTTP/1.1
Host: www.creditkarma.ca
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)
Accept: */*
* old SSL session ID is stale, removing
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=utf-8
< x-content-security-policy:
< Server: CK-FG-server
< Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< ORIGIN-ENV: production
< ORIGIN-DC: us-east4
< Expires: Wed, 12 Jan 2022 18:20:46 GMT
< Cache-Control: max-age=0, no-cache, no-store
< Pragma: no-cache
< Date: Wed, 12 Jan 2022 18:20:46 GMT
< Transfer-Encoding: chunked
< Connection: keep-alive
< Connection: Transfer-Encoding
< Set-Cookie: ck_cabf=IjA5MTRmMDQ2LTE3OTAtNDQ5MC1hODA3LWUzZTRlZDcwYTdlYSI=; Max-Age=31536000; Expires=Thu, 12 Jan 2023 18:20:46 GMT; Secure; SameSite=Strict; Path=/
< Set-Cookie: ck_crumb=6da1442eb87cee1a6c0c08c56a9b07826949e3dc130925b0fcb774a83d566b71f5a9b634c4e4f198ae8dc4a6722abf41; Secure; HttpOnly; SameSite=Strict; Path=/
< Set-Cookie: ck_trace_id=5544f4ea-9d03-462b-ab5f-8a81c70c6c81; HttpOnly; SameSite=Strict; Path=/
< Set-Cookie: ck_lang=en; SameSite=Strict; Path=/
<
* Connection #0 to host www.creditkarma.ca left intact
string(63139) "<!DOCTYPE html>
<html>
<head>
..... Rest of page here