Search code examples
pythonsocketsunixunix-socket

Unix domain socket stops working after exactly 219264 bytes


I'm trying to receive several megabytes of data using a Unix domain socket. My problem is the received data always cuts off at 219,264 bytes. After receiving that amount, I can keep sending, however I cannot receive anything afterwards until I completely reset the connection. My code&minimal examples(given that the socket regularly gets sent data to) is below.

s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
s.connect(f"{socketpath}")
data = b""
while True:
    received = s.recv(1024)
    data += received
    print("Received some data")
while True:
    sock.send(b"{a JSON-RPC string}")

Server side is go-ethereum. There is no exception with receiving(simply a hang upon calling recv), with sending an in s.send(...) OSError: [Errno 32] Broken pipe gets thrown once 219264 bytes are sent in total. Also, once I begin getting Broken pipe, recv begins to respond but always returns an empty result (b'').

*Seems to be happening with 36544 bytes as well now. Completely randomly one of the two.


Solution

  • The observed behavior is from hitting the buffer limit for IPC. When trying to write a lot of data, or more than the IPC buffer size, the write will not finish and write an EOF to the socket. The recv call is reading all this data waiting for the EOF for it to return. Then you end up with a deadlock.

    This is a limit of your current operating system when using a AF_UNIX socket. You can see similar behavior when using a socket pair with an AF_UNIX socket. A small C program that checks the return value will error and return EWOULDBLOCK if the MSG_DONTWAIT flag is set (on the send call).

    There is a great write up here: ipc buffers. The exact number of bytes 219264 is shown empirically to be the limit for a particular Linux system in the write up.

    There are a couple of options for solutions. One option is to send back smaller amounts of data and assemble it on the receiving side. A second is to check for this limit and perform some kind of error handling. Another is to not use Unix domain sockets and accept the overhead of a network protocol (TCP/IP). A fourth option is to implement some concurrency to ensure that the reader and the writer are not in the same process. There are other solutions but this should point any others to a quick solution.