For instance, using this code:
$curl = curl_init();
curl_setopt_array( $curl, array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_URL => "$url" ) );
curl_exec( $curl );
$header = curl_getinfo( $curl, CURLINFO_HTTP_CODE );
curl_close( $curl );
$url = "http://upenn.edu"
will not work, while $url = "http://www.upenn.edu"
will work.
Without the www.
the response code I get is 0
, whereas with the www.
it is 200
.
If I were to use PHP get_headers("http://upenn.edu")
, I would get two errors:
Warning: get_headers() [function.get-headers]: php_network_getaddresses: getaddrinfo failed: nodename nor servname provided, or not known
and
Warning: get_headers(http://upenn.edu) [function.get-headers]: failed to open stream: php_network_getaddresses: getaddrinfo failed: nodename nor servname provided, or not known
However, when I use the exact same code, http://google.com
will work (as well as the expected http://www.google.com
.)
Then, for a website such as http://www.dogpile.com
, the www.
part included returns a response code of 0
whereas without the www.
, I get a 302
.
Why is this? and is there a better method to use in order to ensure reliable results (i.e., where a www.
is not present, yet the response code is still returned?)
I am new to using cURL and dealing with headers and response codes, so any help is appreciated. Thank you.
Your question, even asked because of using curl now, is actually something totally independent to curl. Other client http libraries will be the same with these examples because it is related to the domain name system and services running on a computer.
Curl is a HTTP library. If you do a HTTP request, by default you will try to connect to port 80 on a remote computer.
The remote computer is identified by an IP address. That is a number like 173.194.35.134
- you probably know that already.
Most often not the numbers are used but some domain names, for example google.com
for 173.194.35.134
.
So telling curl to use the URI http://google.com/
will open a connection to
173.194.35.134:80
The domain name system will resolve the domain google.com
to the IP address.
Domain names can be organized in levels. Each level is separated by a dot .
. The so called Top Level Domain (TLD) is the part most on the right, for google.com
that is com
. The Second Level Domain (SLD) is respectively google
then. And with www.google.com
you have another domain name, with three levels then. The www
is commonly refered to as Subdomain.
The most important part here is that for every different domain the DNS system can return a different IP address.
Therefore www.google.com
and google.com
can be two totally different things. The www
subdomain is only a common convention to name the webserver on a network organized with a SLD.TLD
.
So by this being common you could try both and see which one works. However I would not try more than with and w/o www
.