Search code examples
python-3.xsocketsencodingtcp

Sending different file types over a tcp socket in python3, gives 'UnicodeDecodeError'


Writing a small application that allows me to share files over my LAN network at home with less headache as possible, so i want to support all file extensions.

When sending a text file i use the .encode() and .decode() functions and it works just fine, but when trying to send something else (a video per say) it returns the following error:

return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 47: character maps to <undefined>

Is there a way to send a file as it is? without having to encode it? mentioning that I tried sending the file without the .encode() and it returns the exact same error.

The code:

def sendfile(file, client):
    try:
        fd = open(file, 'r')
    except:
        _media_error('Can not open specific file for sending')
        return 0

    resp = client.recv(128).decode()

    # other side is ready, give them the info
    if resp == '[ack]':
        buf = fd.read(_readsize)
        while buf:
            #client.send(buf.encode())
            client.send(buf)
            buf = fd.read(_readsize)
        fd.close()
        client.send('[done]'.encode())
        return 1
    else:
        fd.close()
        return 0

def recvfile(file, client):
    try:
        fd = open(file, 'w+')
    except:
        _media_error('Can not open specific file for receiving')
        return 0

    # ready give me the info
    client.send('[ack]'.encode())
    #buf = client.recv(_readsize).decode()
    buf = client.recv(_readsize)
    while buf:
        if buf != '[done]':
            fd.write(buf)
            buf = client.recv(_readsize)#.decode()
        else:
            fd.close()
            return 1
    return 0

(Ignore the messy returns I'll fix these later)


Solution

  • Since you are sending bytes across the network it's simplest to work exclusively with bytes.

    Open your files in binary mode and do not encode or decode file data. You will still need to encode your ack/done messages.

    def sendfile(file, client):
        try:
            fd = open(file, 'rb')
        except:
            _media_error('Can not open specific file for sending')
            return 0
    
        resp = client.recv(128)
    
        # other side is ready, give them the info
        if resp == '[ack]'.encode():
            buf = fd.read(_readsize)
            while buf:
                #client.send(buf)
                client.send(buf)
                buf = fd.read(_readsize)
            fd.close()
            client.send('[done]'.encode())
            return 1
        else:
            fd.close()
            return 0
    
    def recvfile(file, client):
        try:
            fd = open(file, 'wb+')
        except:
            _media_error('Can not open specific file for receiving')
            return 0
    
        # ready give me the info
        client.send('[ack]'.encode())
        #buf = client.recv(_readsize)
        buf = client.recv(_readsize)
        while buf:
            if buf != '[done]'.encode():
                fd.write(buf)
                buf = client.recv(_readsize)
            else:
                fd.close()
                return 1
        return 0
    

    This approach assumes all the machines on the network share the same endianness.

    Also, you may need to consider special-casing text files if you are transferring between machines with different default encodings. For example, Windows machines tend to have cp1252 as the default, modern Linuxes UTF-8. In this situation you need to decide on a default encoding to use for transfer, and handle switching encodings on each side.