Search code examples
pythonhttp

Incoming HTTP requests don't get read from Python socket in time


I'm making a simple HTTP server with Python using the socket module as a personal exercise to understand the HTTP protocol, and it appears that some incoming requests don't get read from the socket until new requests comes along.

The (very) summarized version of the code is:

import socket 

id = 0

def handle_request(clientSocket):

    has_body = lambda l: "POST" in l or "PUT" in l or "PATCH" in l

    with clientSocket.makefile() as incomingMessage:
        global id
        requestFirstLine = ""
        requestHeaders = ""
        requestBody = ""
        requestID = id

        linesRead = 0
        blanksRead = 0
        maxBlanks = 2
        
        for line in incomingMessage:
        
            if linesRead == 0:
                requestFirstLine = line
                # HTTP methods that don't have a body have a single blank line at the end
                # Methods that do have a body have one between the headers and the body and one at the end
                maxBlanks = maxBlanks - 1 if not has_body(HTTPStartLine) else maxBlanks
        
            if linesRead > 0 and blanksRead == 0 and (line != "\r\n" or line != "\n"):
                requestHeaders += line
                
            if linesRead > 0 and blanksRead == 1 and (line != "\r\n" or line != "\n"):
                requestBody += line
                
            if line == "\r\n" or line == "\n":
                blanksRead += 1
            
            if blanksRead == maxBlanks:
                # Once all lines are read, process the request
                try:
                    # process the HTTP request and generate a response
                    print(f"Request {requestFirstLine} ID: {requestID}")
                    response = dummy_method(requestFirstLine, requestHeaders, requestBody, requestID)
                    clientSocket.sendall(response)
                    print("Request Answered!")
                    id += 1
                    return True
                except Exception as e:
                    print(e)
                    return False
            
            linesRead += 1
    return False # failsafe

def run_server(port):

    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as serverSocket:
        serverSocket.bind(("localhost", port))
        serverSocket.listen(5)
        while True:
            clientSocket, address = serverSocket.accept()
            print(f"Incoming connection from {address}")
            success = handle_request(clientSocket)
            if success:
                # Handle this

This works wonderfully for simple requests, like fetching a single HTML file. However, when I tried to use this server to load a webpage with HTML, CSS (/assets/css/main.css) and an .ico icon (/assets/favicon.ico) in Firefox, only the first two requests get processed, as is printed on the log:

Incoming connection from ('127.0.0.1', 49776)
    Request: GET / HTTP/1.1 ID: 0
Request Answered!

Incoming connection from ('127.0.0.1', 49782)
    Request: GET /assets/css/main.css HTTP/1.1 ID: 1
Request Answered!

Notice that the request for the .ico file was not received. When a new request is made, say like curl -i -X GET http://localhost:9999/ this gets printed in the log:

Incoming connection from ('127.0.0.1', 49776)
    Request: GET / HTTP/1.1 ID: 0
Request Answered!

Incoming connection from ('127.0.0.1', 49782)
    Request: GET /assets/css/main.css HTTP/1.1 ID: 1
Request Answered!

Incoming connection from ('127.0.0.1', 52828)
    Request: GET / HTTP/1.1 ID: 2
Request Answered!

Incoming connection from ('127.0.0.1', 52834)
    Request: GET /assets/favicon.ico HTTP/1.1 ID: 3
Request Answered!

Only after the GET request from curl was answered the GET /assets/favicon.ico request from Firefox gets answered, and I cannot understand why that happens.

I theorized that the time to process the second request (GET /assets/css/main.css) is so long that python is still processing this when the new request comes along, so that by the time execution returns to the serverSocket.accept() line, the new request is not caught in time, although it is still somehow recovered, how that happens is also something that I do not understand.

Any help in understanding this issue will be greatly appreciated.


Solution

  • The issue you're running into probably happens because your server handles requests one at a time and waits (blocks) on serverSocket.accept(). Since Firefox sends multiple requests at once (HTML, CSS, favicon, etc.), only the first couple get processed immediately. The favicon request gets stuck in the queue, and your server doesn’t pick it up until a new request (like from curl) forces accept() to run again.

    Your server is blocking on accept(), meaning it stops and waits for a new connection before continuing. If multiple requests come in quickly, only one gets processed at a time, while others wait in the OS queue. The favicon request was there the whole time, but your server didn’t notice it until something else (like curl) woke it up.

    A good way to fix this is by using select.select(), which lets your server watch multiple sockets at once without blocking.

    import select
    
    def run_server(port):
        with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as serverSocket:
            serverSocket.bind(("localhost", port))
            serverSocket.listen(5)
            serverSocket.setblocking(False)  # Make it non-blocking
    
            connections = []
            while True:
                readable, _, _ = select.select([serverSocket] + connections, [], [])
    
                for s in readable:
                    if s is serverSocket:
                        clientSocket, address = serverSocket.accept()
                        clientSocket.setblocking(False)
                        connections.append(clientSocket)
                        print(f"Incoming connection from {address}")
                    else:
                        success = handle_request(s)
                        connections.remove(s)
    

    Hopefully this will help, there may be an easier way though!