Search code examples
csocketssendstraceposix-select

Two simultaneous sends locks up both programs


I'm debugging my application (sort of a follow up to an earlier question), which is essentially a toy peer to peer client. It works as follows:

  • Peer 1 requests a block (or several blocks) from Peer 2
  • Peer 2 receives the request, and sends the blocks back

And the cycle more or less repeats. This works great for smaller files, but with any file that has to be split into a larger number of chunks (say 250 chunks of 512 bytes) it dies.

Running strace on Peer 2 (the one that receives the requests) looks like so:

....    
[pid 11731] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f03d0ac9000
[pid 11731] lseek(400, 200704, SEEK_SET) = 200704
[pid 11731] read(400, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024
[pid 11731] read(400, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid 11731] sendto(5, "SF\0\0\1\212\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 522, 0, NULL, 0) = 522
[pid 11731] select(6, [4 5], NULL, NULL, NULL) = 1 (in [5])
[pid 11731] recvfrom(5, "BB\0\0\0\t\0\0\1\213\0\0\2\0test.dat\0\0\0\0\0\0\0\0\0\0"..., 300, 0, NULL, NULL) = 300
[pid 11731] open("test.dat", O_RDONLY)  = 401
[pid 11731] fstat(401, {st_dev=makedev(8, 4), st_ino=9187328, st_mode=S_IFREG|0644, st_nlink=1, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=20480, st_size=10485760, st_atime=2012/10/03-10:25:29, st_mtime=2012/10/03-10:25:34, st_ctime=2012/10/03-10:25:34}) = 0
[pid 11731] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f03d0ac8000
[pid 11731] lseek(401, 200704, SEEK_SET) = 200704
[pid 11731] read(401, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1536) = 1536
[pid 11731] read(401, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid 11731] sendto(5, "SF\0\0\1\213\0\0\2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 522, 0, NULL, 

And the results of strace on Peer 1 (the one that sends the requests) looks like so:

....
[pid 11741] sendto(5, "BB\0\0\0\t\0\0\5\213\0\0\2\0test.dat\0\0\0\0\0\0\0\0\0\0"..., 300, 0, NULL, 0) = 300
[pid 11741] sendto(5, "BB\0\0\0\t\0\0\5\214\0\0\2\0test.dat\0\0\0\0\0\0\0\0\0\0"..., 300, 0, NULL, 0) = 300
[pid 11741] sendto(5, "BB\0\0\0\t\0\0\5\215\0\0\2\0test.dat\0\0\0\0\0\0\0\0\0\0"..., 300, 0, NULL, 0) = 300
[pid 11741] sendto(5, "BB\0\0\0\t\0\0\5\216\0\0\2\0test.dat\0\0\0\0\0\0\0\0\0\0"..., 300, 0, NULL, 0) = 300
[pid 11741] sendto(5, "BB\0\0\0\t\0\0\5\217\0\0\2\0test.dat\0\0\0\0\0\0\0\0\0\0"..., 300, 0, NULL, 0) = 300
[pid 11741] sendto(5, "BB\0\0\0\t\0\0\5\220\0\0\2\0test.dat\0\0\0\0\0\0\0\0\0\0"..., 300, 0, NULL, 0) = 300
[pid 11741] sendto(5, "BB\0\0\0\t\0\0\5\221\0\0\2\0test.dat\0\0\0\0\0\0\0\0\0\0"..., 300, 0, NULL, 0

Both die when doing sends. I'm not entirely sure why. If anyone can shed some light on this I'd really appreciate it!


Solution

  • Your sends are blocking because the peer isn't reading them. That causes the peer's receive buffer to fill, which causes the senders's send buffer to fill, which causes send() to block.