I'm new to networking and I'm trying to write a simple TCP client. The first thing I have to do is to send my message length and wait for further instructions. Here's my code:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((TCP_IP, TCP_PORT))
s.send(bytearray(message_serialised_length))
data = s.recv(BUFFER_SIZE)
s.close()
print("response: {}".format(struct.unpack('>I', data)))
Sometimes, i'm able to get a response and the program continues but other times I run into this error:
struct.error: unpack requires a buffer of 4 bytes
.
I'm not sure why this happens and how to fix this. What steps should I take and how can I better understand this problem? I tried removing the struct
library and use .decode()
instead, but that was giving me empty values.
You received more than 4 bytes and unpack requires exactly four bytes for '>I'
. TCP is a streaming protocol. If the server sends 4 bytes for a message size and then sends the message, and you recv(BUFFER_SIZE)
, you will get 1 to BUFFER_SIZE bytes or zero bytes if the server closed the connection. You have to check that you have at least 4 bytes in the buffer, then strip off and convert those bytes to a length, then keep buffering more recv
calls until you get the number of bytes specified by the message length.
Here's a clear example. The following server code waits for a client connection, and then sends a random number from 1-1000 of value 0 bytes, then a random number from 1-1000 of value 1 bytes, repeating until a final value byte of 255. Each stream of the same value is sent in a different .sendall()
call.
The client will simply connect to the server and call .recv(4096)
, printing out how many bytes it received along with the unique set of numbers received. Receiving the same number to two different .recv
calls indicates the data was split across the two receives. Receiving more than one number a recv indicates the sends were combined in a receive.
server.py
from socket import *
import random
s=socket()
s.bind(('',5000))
s.listen()
while True:
c,a = s.accept()
print(f'Connection from {a}')
with c:
for i in range(256):
length = random.randint(1,1000)
data = bytes([i] * length)
c.sendall(data)
print('Disconnected.')
client.py:
from socket import *
s=socket()
s.connect(('192.168.1.5',5000))
while True:
data = s.recv(4096)
if not data: break
print(len(data),set(data))
Output:
473 {0} # All the zeros were recv'd in the first recv
1704 {1, 2, 3} # Three sends were combined
4096 {4, 5, 6, 7, 8, 9, 10, 11, 12} # 12s were split across two packets
4096 {12, 13, 14, 15, 16}
4096 {16, 17, 18, 19, 20, 21, 22, 23, 24, 25}
4096 {32, 33, 34, 35, 36, 25, 26, 27, 28, 29, 30, 31}
4096 {36, 37, 38, 39, 40, 41, 42}
4096 {42, 43, 44, 45, 46, 47, 48, 49, 50, 51} # 51 was split across recvs.
3495 {51, 52, 53, 54, 55, 56, 57, 58, 59} # Didn't get a full 4096.
4096 {64, 65, 66, 60, 61, 62, 63} # Note: sets are unordered, so the numbers aren't sequential
1472 {66, 67, 68, 69, 70, 71} # Rarely gets the full 4096 after this point.
839 {72, 73}
1577 {74, 75, 76}
1610 {80, 77, 78, 79}
1566 {81, 82}
124 {83}
285 {84}
2087 {85, 86, 87}
4096 {88, 89, 90, 91, 92, 93}
168 {93}
1270 {94, 95}
1737 {96, 97, 98}
1593 {99, 100, 101}
1674 {102, 103}
1290 {104, 105, 106, 107}
1266 {108, 109}
3499 {110, 111, 112, 113, 114, 115}
2694 {116, 117, 118, 119, 120, 121, 122} # split 122s even without a full buffer.
666 {122, 123}
1532 {124, 125, 126, 127}
1065 {128, 129}
1121 {130, 131}
1601 {132, 133, 134, 135}
1507 {136, 137}
1055 {138, 139}
1663 {140, 141}
1286 {144, 145, 142, 143}
1796 {146, 147, 148}
2199 {149, 150, 151}
1636 {152, 153, 154}
809 {155, 156}
1462 {157, 158}
576 {159}
656 {160, 161}
1693 {162, 163, 164, 165}
1460 {168, 169, 166, 167}
27 {169}
1679 {170, 171, 172, 173}
505 {176, 174, 175}
736 {177, 178}
3910 {179, 180, 181, 182, 183, 184, 185}
1452 {186, 187, 188}
1772 {192, 189, 190, 191}
2112 {193, 194, 195, 196}
1712 {200, 197, 198, 199}
1769 {201, 202}
1024 {203, 204}
984 {205, 206}
339 {207}
1869 {208, 209}
1819 {210, 211}
624 {212}
1474 {216, 213, 214, 215}
1619 {217, 218, 219}
842 {220, 221, 222}
746 {224, 223}
1453 {225, 226}
4096 {227, 228, 229, 230, 231, 232, 233}
1514 {233, 234, 235, 236}
1554 {237, 238}
1677 {240, 241, 239}
1450 {242, 243}
522 {244, 245}
377 {246}
1919 {247, 248, 249, 250, 251}
1437 {252, 253, 254, 255}
Note that this was run between two separate computers (a "real" situation). Often using "localhost" and testing client/server on the same machine doesn't fragment the sends at all, which is unfortunate because people learn and test on a single machine and wonder why there code breaks later.
To actually implement messages in TCP, one technique is what appears in the original post code, send the message size, followed by the message. Here's an example of that:
server.py
from socket import *
import random
import struct
s=socket()
s.bind(('',5000))
s.listen()
while True:
c,a = s.accept()
print(f'Connection from {a}')
with c:
for i in range(256):
length = random.randint(1,1000)
data = bytes([i] * length)
c.send(struct.pack('<I',length)) # send 4-byte length
c.sendall(data) # send actual message
print('Disconnected.')
client.py
from socket import *
import struct
class Client:
'''Buffer the TCP stream and extract meaningful data on message boundaries.'''
def __init__(self,addr,port):
self.sock = socket()
self.sock.connect((addr,port))
self.buffer = b''
def __enter__(self):
return self
def __exit__(self,*args):
self.sock.close()
def get_raw(self,size):
'''Request a specific number of bytes to extract from the stream.'''
while len(self.buffer) < size:
data = self.sock.recv(4096)
if not data: # server closed
return b''
self.buffer += data
msg,self.buffer = self.buffer[:size],self.buffer[size:]
return msg
def get(self):
'''Extract a message from the stream.'''
raw = self.get_raw(4) # Get the 4-byte length data
if not raw:
return b''
length = struct.unpack('<I',raw)[0]
return self.get_raw(length) # Return the actual message
with Client('192.168.1.5',5000) as c:
while True:
data = c.get()
if not data: break
print(len(data),set(data))
Output:
290 {0}
619 {1}
450 {2}
238 {3}
803 {4}
779 {5}
358 {6}
245 {7}
782 {8}
90 {9}
108 {10}
370 {11}
346 {12}
783 {13}
21 {14}
646 {15}
53 {16}
912 {17}
678 {18}
794 {19}
617 {20}
76 {21}
279 {22}
634 {23}
479 {24}
189 {25}
117 {26}
865 {27}
940 {28}
477 {29}
596 {30}
700 {31}
354 {32}
734 {33}
272 {34}
511 {35}
669 {36}
816 {37}
43 {38}
948 {39}
256 {40}
829 {41}
48 {42}
193 {43}
721 {44}
132 {45}
334 {46}
494 {47}
952 {48}
768 {49}
103 {50}
657 {51}
354 {52}
298 {53}
262 {54}
713 {55}
386 {56}
325 {57}
676 {58}
391 {59}
707 {60}
162 {61}
462 {62}
575 {63}
237 {64}
502 {65}
488 {66}
28 {67}
588 {68}
2 {69}
660 {70}
937 {71}
452 {72}
412 {73}
24 {74}
320 {75}
464 {76}
457 {77}
71 {78}
213 {79}
208 {80}
969 {81}
204 {82}
542 {83}
179 {84}
779 {85}
393 {86}
630 {87}
492 {88}
75 {89}
592 {90}
630 {91}
946 {92}
447 {93}
497 {94}
644 {95}
626 {96}
629 {97}
840 {98}
614 {99}
29 {100}
876 {101}
644 {102}
318 {103}
285 {104}
936 {105}
416 {106}
281 {107}
888 {108}
529 {109}
249 {110}
89 {111}
250 {112}
868 {113}
454 {114}
303 {115}
506 {116}
337 {117}
747 {118}
582 {119}
248 {120}
321 {121}
746 {122}
536 {123}
491 {124}
739 {125}
234 {126}
532 {127}
115 {128}
691 {129}
939 {130}
464 {131}
603 {132}
12 {133}
920 {134}
467 {135}
697 {136}
441 {137}
673 {138}
466 {139}
595 {140}
697 {141}
801 {142}
971 {143}
986 {144}
615 {145}
591 {146}
556 {147}
734 {148}
904 {149}
256 {150}
865 {151}
993 {152}
942 {153}
3 {154}
152 {155}
404 {156}
840 {157}
253 {158}
558 {159}
917 {160}
326 {161}
29 {162}
713 {163}
841 {164}
191 {165}
432 {166}
100 {167}
936 {168}
185 {169}
586 {170}
736 {171}
474 {172}
400 {173}
619 {174}
933 {175}
588 {176}
512 {177}
79 {178}
437 {179}
504 {180}
115 {181}
321 {182}
982 {183}
288 {184}
6 {185}
531 {186}
538 {187}
929 {188}
790 {189}
769 {190}
567 {191}
647 {192}
258 {193}
807 {194}
966 {195}
936 {196}
715 {197}
259 {198}
301 {199}
718 {200}
743 {201}
281 {202}
448 {203}
776 {204}
498 {205}
575 {206}
569 {207}
534 {208}
42 {209}
15 {210}
436 {211}
260 {212}
156 {213}
180 {214}
580 {215}
99 {216}
647 {217}
767 {218}
983 {219}
274 {220}
23 {221}
81 {222}
918 {223}
584 {224}
55 {225}
302 {226}
750 {227}
410 {228}
685 {229}
710 {230}
197 {231}
498 {232}
682 {233}
906 {234}
687 {235}
599 {236}
761 {237}
471 {238}
269 {239}
791 {240}
241 {241}
822 {242}
364 {243}
591 {244}
74 {245}
245 {246}
143 {247}
113 {248}
305 {249}
512 {250}
148 {251}
755 {252}
437 {253}
167 {254}
559 {255}
(Can you tell I'm bored @ home during COVID-19 outbreak?)