Search code examples
pythonsocketsgetaddrinfo

How to get html code using python sockets


So I'm trying to get the source code of google using only python sockets and not any other libraries such as urllib. I don't understand why my GET request isn't working, I tried all possible methods. This is the code I have, it's pretty small and I don't wanna get too much details. Just looking for a protocol that's used to get source codes. I assumed it would be the GET method but it doesn't work. I need a response that resembles urllib.request but using python sockets only.

  • If I pass "https://www.google.com" to socket.gethostbyname(), it fails on the getaddrinfo.
  • Also when I try to GET request from python.org, the while loop never ends.


import socket;

s=socket.socket();

host=socket.gethostbyname("www.google.com");

port=80;

send_buf="GET / \r\n"\
        "Host: www.google.com\r\n";

s.connect((host, port));

s.sendall(bytes(send_buf, encoding="utf-8"));

data="";

part=None;

while( True ):

    part=s.recv(2048);

    data+=str(part, "utf-8");

    if( part==b'' ):

        break;

s.close();

Solution

  • The following worked for me:

    import socket
    s=socket.socket()
    host=socket.gethostbyname('www.google.com')
    port=80
    s.connect((host,port))
    s.sendall("GET /\r\n")
    val = s.recv(10000)
    # Split off the HTTP headers
    val = val.split('\r\n\r\n',1)[1]