Search code examples
pythonhttptcppython-sockets

www.google.com returns HTTP 301


I'm looking at this example of making a simple HTTP request in Python using only the built in socket module:

import socket

target_host = "www.google.com"
target_port = 80


client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client.connect((target_host, target_port))
client.send(b"GET / HTTP/1.1\r\nHost: google.com\r\n\r\n")
response = client.recv(4096) 
client.close()
print(response)

When I run this code, I get back a 301:

<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>

I'm confused by this because the "new location" looks identical to the URL I requested. Using curl or wget on the same URL (www.google.com) returns a 200. I'm having a hard time understanding what is different. Are curl/wget getting the same 301 "behind the scenes" and just automatically requesting the 'new' resource? And if so how is that possible given that, as mentioned above, the 'new' location appears identical to the original?


Solution

  • I'm confused by this because the "new location" looks identical to the URL I requested

    It doesn't. Your host header says that you are accessing google.com, i.e. without www:

    client.send(b"GET / HTTP/1.1\r\nHost: google.com\r\n\r\n")
    

    This gets redirected to www.google.com, i.e. with www:

    <A HREF="http://www.google.com/">here</A>.