Search code examples
pythonpython-3.xsocketsnetwork-programmingtcp

Python TCP networking inconsistencies


I'm new to networking and I'm trying to write a simple TCP client. The first thing I have to do is to send my message length and wait for further instructions. Here's my code:

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((TCP_IP, TCP_PORT))

s.send(bytearray(message_serialised_length))

data = s.recv(BUFFER_SIZE)

s.close()
print("response: {}".format(struct.unpack('>I', data)))

Sometimes, i'm able to get a response and the program continues but other times I run into this error: struct.error: unpack requires a buffer of 4 bytes. I'm not sure why this happens and how to fix this. What steps should I take and how can I better understand this problem? I tried removing the struct library and use .decode() instead, but that was giving me empty values.


Solution

  • You received more than 4 bytes and unpack requires exactly four bytes for '>I'. TCP is a streaming protocol. If the server sends 4 bytes for a message size and then sends the message, and you recv(BUFFER_SIZE), you will get 1 to BUFFER_SIZE bytes or zero bytes if the server closed the connection. You have to check that you have at least 4 bytes in the buffer, then strip off and convert those bytes to a length, then keep buffering more recv calls until you get the number of bytes specified by the message length.

    Here's a clear example. The following server code waits for a client connection, and then sends a random number from 1-1000 of value 0 bytes, then a random number from 1-1000 of value 1 bytes, repeating until a final value byte of 255. Each stream of the same value is sent in a different .sendall() call.

    The client will simply connect to the server and call .recv(4096), printing out how many bytes it received along with the unique set of numbers received. Receiving the same number to two different .recv calls indicates the data was split across the two receives. Receiving more than one number a recv indicates the sends were combined in a receive.

    server.py

    from socket import *
    import random
    
    s=socket()
    s.bind(('',5000))
    s.listen()
    while True:
        c,a = s.accept()
        print(f'Connection from {a}')
        with c:
            for i in range(256):
                length = random.randint(1,1000)
                data = bytes([i] * length)
                c.sendall(data)
        print('Disconnected.')
    

    client.py:

    from socket import *
    s=socket()
    s.connect(('192.168.1.5',5000))
    while True:
        data = s.recv(4096)
        if not data: break
        print(len(data),set(data))
    

    Output:

    473 {0}               # All the zeros were recv'd in the first recv
    1704 {1, 2, 3}        # Three sends were combined
    4096 {4, 5, 6, 7, 8, 9, 10, 11, 12}   # 12s were split across two packets
    4096 {12, 13, 14, 15, 16}
    4096 {16, 17, 18, 19, 20, 21, 22, 23, 24, 25}
    4096 {32, 33, 34, 35, 36, 25, 26, 27, 28, 29, 30, 31}
    4096 {36, 37, 38, 39, 40, 41, 42}
    4096 {42, 43, 44, 45, 46, 47, 48, 49, 50, 51}  # 51 was split across recvs.
    3495 {51, 52, 53, 54, 55, 56, 57, 58, 59}      # Didn't get a full 4096.
    4096 {64, 65, 66, 60, 61, 62, 63} # Note: sets are unordered, so the numbers aren't sequential
    1472 {66, 67, 68, 69, 70, 71}     # Rarely gets the full 4096 after this point.
    839 {72, 73}
    1577 {74, 75, 76}
    1610 {80, 77, 78, 79}
    1566 {81, 82}
    124 {83}
    285 {84}
    2087 {85, 86, 87}
    4096 {88, 89, 90, 91, 92, 93}
    168 {93}
    1270 {94, 95}
    1737 {96, 97, 98}
    1593 {99, 100, 101}
    1674 {102, 103}
    1290 {104, 105, 106, 107}
    1266 {108, 109}
    3499 {110, 111, 112, 113, 114, 115}
    2694 {116, 117, 118, 119, 120, 121, 122}  # split 122s even without a full buffer.
    666 {122, 123}
    1532 {124, 125, 126, 127}
    1065 {128, 129}
    1121 {130, 131}
    1601 {132, 133, 134, 135}
    1507 {136, 137}
    1055 {138, 139}
    1663 {140, 141}
    1286 {144, 145, 142, 143}
    1796 {146, 147, 148}
    2199 {149, 150, 151}
    1636 {152, 153, 154}
    809 {155, 156}
    1462 {157, 158}
    576 {159}
    656 {160, 161}
    1693 {162, 163, 164, 165}
    1460 {168, 169, 166, 167}
    27 {169}
    1679 {170, 171, 172, 173}
    505 {176, 174, 175}
    736 {177, 178}
    3910 {179, 180, 181, 182, 183, 184, 185}
    1452 {186, 187, 188}
    1772 {192, 189, 190, 191}
    2112 {193, 194, 195, 196}
    1712 {200, 197, 198, 199}
    1769 {201, 202}
    1024 {203, 204}
    984 {205, 206}
    339 {207}
    1869 {208, 209}
    1819 {210, 211}
    624 {212}
    1474 {216, 213, 214, 215}
    1619 {217, 218, 219}
    842 {220, 221, 222}
    746 {224, 223}
    1453 {225, 226}
    4096 {227, 228, 229, 230, 231, 232, 233}
    1514 {233, 234, 235, 236}
    1554 {237, 238}
    1677 {240, 241, 239}
    1450 {242, 243}
    522 {244, 245}
    377 {246}
    1919 {247, 248, 249, 250, 251}
    1437 {252, 253, 254, 255}
    

    Note that this was run between two separate computers (a "real" situation). Often using "localhost" and testing client/server on the same machine doesn't fragment the sends at all, which is unfortunate because people learn and test on a single machine and wonder why there code breaks later.

    To actually implement messages in TCP, one technique is what appears in the original post code, send the message size, followed by the message. Here's an example of that:

    server.py

    from socket import *
    import random
    import struct
    
    s=socket()
    s.bind(('',5000))
    s.listen()
    while True:
        c,a = s.accept()
        print(f'Connection from {a}')
        with c:
            for i in range(256):
                length = random.randint(1,1000)
                data = bytes([i] * length)
                c.send(struct.pack('<I',length)) # send 4-byte length
                c.sendall(data)                  # send actual message
        print('Disconnected.')
    
    
    

    client.py

    from socket import *
    import struct
    
    class Client:
        '''Buffer the TCP stream and extract meaningful data on message boundaries.'''
    
        def __init__(self,addr,port):
            self.sock = socket()
            self.sock.connect((addr,port))
            self.buffer = b''
    
        def __enter__(self):
            return self
    
        def __exit__(self,*args):
            self.sock.close()
    
        def get_raw(self,size):
            '''Request a specific number of bytes to extract from the stream.'''
            while len(self.buffer) < size:
                data = self.sock.recv(4096)
                if not data: # server closed
                    return b''
                self.buffer += data
            msg,self.buffer = self.buffer[:size],self.buffer[size:]
            return msg
    
        def get(self):
            '''Extract a message from the stream.'''
            raw = self.get_raw(4)  # Get the 4-byte length data
            if not raw:
                return b''
            length = struct.unpack('<I',raw)[0]
            return self.get_raw(length) # Return the actual message
    
    with Client('192.168.1.5',5000) as c:
        while True:
            data = c.get()
            if not data: break
            print(len(data),set(data))
    

    Output:

    290 {0}
    619 {1}
    450 {2}
    238 {3}
    803 {4}
    779 {5}
    358 {6}
    245 {7}
    782 {8}
    90 {9}
    108 {10}
    370 {11}
    346 {12}
    783 {13}
    21 {14}
    646 {15}
    53 {16}
    912 {17}
    678 {18}
    794 {19}
    617 {20}
    76 {21}
    279 {22}
    634 {23}
    479 {24}
    189 {25}
    117 {26}
    865 {27}
    940 {28}
    477 {29}
    596 {30}
    700 {31}
    354 {32}
    734 {33}
    272 {34}
    511 {35}
    669 {36}
    816 {37}
    43 {38}
    948 {39}
    256 {40}
    829 {41}
    48 {42}
    193 {43}
    721 {44}
    132 {45}
    334 {46}
    494 {47}
    952 {48}
    768 {49}
    103 {50}
    657 {51}
    354 {52}
    298 {53}
    262 {54}
    713 {55}
    386 {56}
    325 {57}
    676 {58}
    391 {59}
    707 {60}
    162 {61}
    462 {62}
    575 {63}
    237 {64}
    502 {65}
    488 {66}
    28 {67}
    588 {68}
    2 {69}
    660 {70}
    937 {71}
    452 {72}
    412 {73}
    24 {74}
    320 {75}
    464 {76}
    457 {77}
    71 {78}
    213 {79}
    208 {80}
    969 {81}
    204 {82}
    542 {83}
    179 {84}
    779 {85}
    393 {86}
    630 {87}
    492 {88}
    75 {89}
    592 {90}
    630 {91}
    946 {92}
    447 {93}
    497 {94}
    644 {95}
    626 {96}
    629 {97}
    840 {98}
    614 {99}
    29 {100}
    876 {101}
    644 {102}
    318 {103}
    285 {104}
    936 {105}
    416 {106}
    281 {107}
    888 {108}
    529 {109}
    249 {110}
    89 {111}
    250 {112}
    868 {113}
    454 {114}
    303 {115}
    506 {116}
    337 {117}
    747 {118}
    582 {119}
    248 {120}
    321 {121}
    746 {122}
    536 {123}
    491 {124}
    739 {125}
    234 {126}
    532 {127}
    115 {128}
    691 {129}
    939 {130}
    464 {131}
    603 {132}
    12 {133}
    920 {134}
    467 {135}
    697 {136}
    441 {137}
    673 {138}
    466 {139}
    595 {140}
    697 {141}
    801 {142}
    971 {143}
    986 {144}
    615 {145}
    591 {146}
    556 {147}
    734 {148}
    904 {149}
    256 {150}
    865 {151}
    993 {152}
    942 {153}
    3 {154}
    152 {155}
    404 {156}
    840 {157}
    253 {158}
    558 {159}
    917 {160}
    326 {161}
    29 {162}
    713 {163}
    841 {164}
    191 {165}
    432 {166}
    100 {167}
    936 {168}
    185 {169}
    586 {170}
    736 {171}
    474 {172}
    400 {173}
    619 {174}
    933 {175}
    588 {176}
    512 {177}
    79 {178}
    437 {179}
    504 {180}
    115 {181}
    321 {182}
    982 {183}
    288 {184}
    6 {185}
    531 {186}
    538 {187}
    929 {188}
    790 {189}
    769 {190}
    567 {191}
    647 {192}
    258 {193}
    807 {194}
    966 {195}
    936 {196}
    715 {197}
    259 {198}
    301 {199}
    718 {200}
    743 {201}
    281 {202}
    448 {203}
    776 {204}
    498 {205}
    575 {206}
    569 {207}
    534 {208}
    42 {209}
    15 {210}
    436 {211}
    260 {212}
    156 {213}
    180 {214}
    580 {215}
    99 {216}
    647 {217}
    767 {218}
    983 {219}
    274 {220}
    23 {221}
    81 {222}
    918 {223}
    584 {224}
    55 {225}
    302 {226}
    750 {227}
    410 {228}
    685 {229}
    710 {230}
    197 {231}
    498 {232}
    682 {233}
    906 {234}
    687 {235}
    599 {236}
    761 {237}
    471 {238}
    269 {239}
    791 {240}
    241 {241}
    822 {242}
    364 {243}
    591 {244}
    74 {245}
    245 {246}
    143 {247}
    113 {248}
    305 {249}
    512 {250}
    148 {251}
    755 {252}
    437 {253}
    167 {254}
    559 {255}
    

    (Can you tell I'm bored @ home during COVID-19 outbreak?)