Search code examples
pythonpython-3.xsocketsservertcp

Socket received invalid start byte (UnicodeDecodeError, SOCK_STREAM)


I am using a blocking python socket of the type socket.socket(socket.AF_INET, socket.SOCK_STREAM) to send messages from my client to my server. If I send messages in quick succession (but not simultaneously), I get the following error on my server:

in receive
    size = int(rec_sock.recv(HEADER_SIZE).decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

Before each message I send a header with the length of the following message. The header is encoded in UTF-8 by the client and therefore shouldn't throw this error. The header is also the only part of the message that the client attempts to decode with UTF-8 so I am not sure how this error can happen.

I am using the following methods to send, receive, and make a header:

BUF_SIZE = 16384
HEADER_SIZE = 16

def receive(rec_sock: socket.socket) -> Any:
    message = b''
    size = int(rec_sock.recv(HEADER_SIZE).decode('utf-8'))

    if size:
        while len(message) < size:
            data = rec_sock.recv(BUF_SIZE)
            message += data

    return pickle.loads(message) if len(message) else None


def send(resp: Any, send_sock: socket.socket) -> None:
    pickeled = pickle.dumps(resp)
    send_sock.send(make_header(len(pickeled)))
    send_sock.send(pickeled)


def make_header(msg_size: int) -> bytes:
    encoded = str(msg_size).encode('utf-8')
    return b'0' * (HEADER_SIZE - len(encoded)) + encoded

Solution

  • The issue was that I am always filling the entire buffer in my receive method, even if the length of the remaining message is less than the buffer size. Because of this, if two messages were sent consecutively in a short time frame, the header of the next message was read by the previous call to receive and the actual content of the next message is read as the header (which cannot be decoded by utf-8).

    Changing the receive method to this fixed it:

    def receive(rec_sock: socket.socket) -> Any:
        message = b''
        size = int(rec_sock.recv(HEADER_SIZE).decode('utf-8'))
        print("Waiting for", size, "bytes ...")
    
        if size:
            while len(message) < size:
                remaining = size - len(message)
                read_len = BUF_SIZE if remaining >= BUF_SIZE else remaining
                data = rec_sock.recv(read_len)
                message += data
    
            print("Received", len(message), "bytes.")
    
        return pickle.loads(message) if len(message) else None