Search code examples
pythonlinuxhttpselect

How to make chunked streaming PUT request to BaseHTTPRequestHandler not loose bytes?


Consider the following MCVE, that runs a python program that:

  • sets the global variable FIXME set during startup from the first command line argument
  • in one thread starts a HTTP server
    • that listens to chunked PUT requests using select on the self.rfile.fileno()
    • and prints the received data
  • in the main thread it runs a client for that HTTP server
    • if FIXME is true - sleep for 100 milliseconds
    • then sends a chunked PUT request with a single line of data using requests module.

#!/usr/bin/env python

import os
import select
import sys
import threading
import time
from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
from typing import Iterator

import requests

ip = "127.0.0.1"
port = 10000
FIXME = True


class MyServer(BaseHTTPRequestHandler):
    def do_PUT(self):
        inputfd = self.rfile.fileno()
        os.set_blocking(inputfd, False)
        while 1:
            print("SERVER SELECT")
            sel = select.select([inputfd], [], [inputfd])
            if inputfd in sel[2]:
                print(f"SERVER {inputfd} errored")
                break
            if inputfd not in sel[0]:
                print(f"SERVER read file descriptor closed")
                break
            chunk: bytes = os.read(inputfd, 8192)
            print(f"SERVER RECV {len(chunk)} {chunk!r}")
            if len(chunk) == 0:
                print(f"SERVER len(chunk) == 0")
                break
            if chunk:
                print(f"{chunk!r}")
            if b"0\r\n\r\n" in chunk:
                # terminating chunk
                break
        self.send_response(200)
        self.end_headers()


def server():
    with ThreadingHTTPServer((ip, int(port)), MyServer) as webServer:
        webServer.serve_forever()


class Client:
    def get_chunks_to_write(self) -> Iterator[bytes]:
        if FIXME:
            time.sleep(0.1)
        print(f"CLIENT write")
        yield b"123"

    def main(self):
        requests.put(f"http://{ip}:{port}", data=self.get_chunks_to_write())


def cli():
    threading.Thread(target=server, daemon=True).start()
    # Wait for server startup
    time.sleep(0.5)
    Client().main()


if __name__ == "__main__":
    FIXME = int(sys.argv[1])
    cli()

When FIXME is set, the execution is fine:

$ ./test.py 1
SERVER SELECT
CLIENT write
SERVER RECV 13 b'3\r\n123\r\n0\r\n\r\n'
b'3\r\n123\r\n0\r\n\r\n'
127.0.0.1 - - [27/Nov/2023 11:14:01] "PUT / HTTP/1.1" 200 -

However, when FIXME is false, then the execution blocks and the bytes are lost (takes about ~10 tries to reproduce with this MCVE, happens every time on real program):

$ ./1.py 0
CLIENT write
SERVER SELECT

What can I do to remove the extra sleep? What synchronization is missing? What is happening?

I think this happens, because the client closes or transfers bytes before the server can enter select. But I do not know how could that influence anything, as select should still allow processing the remaining bytes on the socket.

I tried searching the net. Setting set_blocking(inputfd, True) and os.read(inputfd, 1) also does not read the transferred bytes - I assume this is, as if, they were already read by BaseHTTPRequestHandler. How can I access them?


Solution

  • select.select([inputfd]
    

    You are right - python reads from the socket first and buffers the data. select selects on the underlying socket, however, python is buffering read data within self.rfile. Do not use select.select and os.read() on raw socket, instead use python wrappers with blocking read like self.rfile.read() and self.rfile.readline() which use the buffers managed by python.