Search code examples
pythonpython-3.xsocketshttpserver

Get incoming URL from listening socket/HTTPServer


Good day Stackoverflow,

This morning I've ran into a problem of which I can't seem to find a working answer. I'm trying to get the full URL (what shows up in the address bar) via either a HTTPServer or simple socket-ing that I get from a server that redirects me to localhost (Which has nothing behind it (no webserver, no pages, nothing), except the listening code below.) with the token and scope variables (as seen in the URL in question below). My desired result would be these variables to be saved so I can work with them: http://localhost/#token=aai789as&scope=book%3Aedit+chat%3Aedit

I have tried the following with some progress but not the desired result:

from http.server import SimpleHTTPRequestHandler, HTTPServer
from urllib.parse import parse_qs

class MyHandler(SimpleHTTPRequestHandler):
    def do_GET(self):
        qs = {}
        path = self.path
        if '?' in path:
            path, tmp = path.split('?', 1)
            qs = urlparse.parse_qs(tmp)

        print(self.path)
        print (path, qs)

    def log_request(self, code=None, size=None):
        print('Request')

    def log_message(self, format, *args):
        print('Message')

if __name__ == "__main__":
    try:
        server = HTTPServer(('localhost', 80), MyHandler)
        print('Started http server')
        server.serve_forever()
    except KeyboardInterrupt:
        print('^C received, shutting down server')
        server.socket.close()

The above snippet loads, but doesn't actually print anything of use. In fact, it prints just blank statements. But it does detect a connection being made. So does this:

import socket

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(("localhost", 80))
s.listen(1)

conn, addr = s.accept()
d = conn.recv(4096)
conn.close()

print(d)

But this DOES return more than blank statements, yet it's hardly enough to get the variables from the URL:

b'\x07\n'
b'GET / HTTP/1.1\r\nHost: localhost\r\nUser-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\nAccept-Language: en,en-US;q=0.7,nl;q=0.3\r\nAccept-Encoding: gzip, deflate\r\nDNT: 1\r\nConnection: keep-alive\r\nUpgrade-Insecure-Requests: 1\r\n\r\n'

I don't know what I'm supposed to be doing and since I don't know what exactly I'm looking for; searching through the documentation has taken up the better half of my day. As such I turn to the ever helpful Stackoverflow in the hopes of finding better knowledge than I possess.

Thank you for your time, - Brent


Solution

  •  http://localhost/#token=aai789as&scope=book%3Aedit+chat%3Aedit
    

    This kind of URL is only transferred in part to the server. The # and everything after is only known to the browser and can be accessed as location.hash from withing Javascript. It will not be transferred to the server, i.e. all the server will see is http://localhost/.

    b'GET / HTTP/1.1\r\nHost: localhost\r\n ...'
    

    This part provides everything from the URL which the server will know. The localhost in the Host header specifies the hostname and the GET / specifies the path / - which makes together 'http://' + 'localhost' + '/' i.e. http://localhost/.

    For more information see Is the anchor part of a URL being sent to a web server?.