I have a simple question with code below. Hopefully I didn't make a mistake in the code.
I'm a network engineer, and I need to test certain linux behavior of our business application keepalives during network outages (I'm going to insert some iptables stuff later to jack with the connection - first I want to make sure I got the client & server right).
As part of a network failure test I'm conducting, I wrote a non-blocking Python TCP client and server that are supposed to blindly send messages to each other in a loop. To understand what's happening I am using loop counters.
The server's loop should be relatively straightforward. I loop through every fd that select
says is ready. I never even import sleep
anywhere in my server's code. From this perspective, I don't expect the server's code to pause while it loops over the client's socket , but for some reason the server code pauses intermittently (more detail, below).
I initially didn't put a sleep in the client's loop. Without a sleep on the client side, the server and client seem to be as efficient as I want. However, when I put a time.sleep(1)
statement after the client does an fd.send()
to the server, the TCP server code intermittently pauses while the client is sleeping.
My questions:
time.sleep()
in the client's fd.send()
loop? If so, what am I doing wrong?I'm running this on two RHEL6 linux machines. To reproduce the issue...
SERVER_HOSTNAME
and SERVER_DOMAIN
in the client's code to be the hostname and domain of the server you're running this onAfter the client connects, you'll see messages as shown in EXHIBIT 1 scrolling quickly in the server's terminal. After a few seconds The scrolling pauses intermittently when the client hits time.sleep()
. I don't expect to see those pauses, but maybe I've misunderstood something.
EXHIBIT 1
---
LOOP_COUNT 0
---
LOOP_COUNT 1
---
LOOP_COUNT 2
---
LOOP_COUNT 3
CLIENTMSG: 'client->server 0'
---
LOOP_COUNT 4
---
LOOP_COUNT 5
---
LOOP_COUNT 6
---
LOOP_COUNT 7
---
LOOP_COUNT 8
---
LOOP_COUNT 9
---
LOOP_COUNT 10
---
LOOP_COUNT 11
---
If I wrote this test code correctly and the server shouldn't pause, why is the TCP server intermittently pausing while it polls the client's connection for data?
Answering my own question. My blocking problem was caused by calling select() with a non-zero timeout.
When I changed select() to use a zero-second timeout, I got expected results.
tcp_server.py
#!/usr/bin/python -u
from socket import AF_INET, SOCK_STREAM, SO_REUSEADDR, SOL_SOCKET
#from socket import MSG_OOB <--- for send()
from socket import socket
import socket as socket_module
import select
import fcntl
import os
host = ''
port = 9997
serv_sock = socket(AF_INET, SOCK_STREAM)
serv_sock.setsockopt(SOL_SOCKET, SOCK_STREAM, 1)
serv_sock.bind((host, port))
serv_sock.listen(5)
fcntl.fcntl(serv_sock, fcntl.F_SETFL, os.O_NONBLOCK) # Make the socket non-blocking
sock_list = [serv_sock]
from_client_str = '__DEFAULT__'
to_client_idx = 0
loop_count = 0
while True:
recv_ready_list, send_ready_list, exception_ready = select.select(sock_list, sock_list,
[], 5)
print "---"
print "LOOP_COUNT", loop_count
## Read all sockets which are input-ready... might be client or server...
for sock_fd in recv_ready_list:
# accept() if we're reading on the server socket...
if sock_fd is serv_sock:
clientsock, clientaddr = sock_fd.accept()
sock_list.append(clientsock)
# read input from the client socket...
else:
try:
from_client_str = sock_fd.recv(4096)
if from_client_str=='':
# Client closed the socket...
print "CLIENT CLOSED SOCKET"
sock_list.remove(sock_fd)
except socket_module.error, e:
print "WARNING RECV FAIL"
print "from_client_str: '{0}'".format(from_client_str)
for sock_fd in send_ready_list:
if sock_fd is not serv_sock:
try:
to_client_str = "server->client: {0}\n".format(to_client_idx)
sock_fd.send(to_client_str)
to_client_idx += 1
except socket_module.error, e:
print "TO CLIENT SEND ERROR", e
loop_count += 1
tcp_client.py
#!/usr/bin/python -u
from socket import AF_INET, SOCK_STREAM
from socket import gethostname, socket
import socket as socket_module
import select
import fcntl
import errno
import time
import sys
import os
## NOTE: Using this script to simulate a scheduler
SERVER_HOSTNAME = 'myHostname'
SERVER_DOMAIN = 'mydomain.local'
PORT = 9997
def handle_socket_error_continue(e):
## non-blocking socket info from:
## https://stackoverflow.com/a/16745561/667301
print "HANDLE_SOCKET_ERROR_CONTINUE"
err = e.args[0]
if (err==errno.EAGAIN) or (err==errno.EWOULDBLOCK):
print 'CLIENT DEBUG: No data input from server'
return True
else:
print 'FROM SERVER RECV ERROR: {0}'.format(e)
sys.exit(1)
c2s = socket(AF_INET, SOCK_STREAM) # Client to server socket...
c2s.connect(('.'.join((SERVER_HOSTNAME, SERVER_DOMAIN,)), PORT))
# Set socket non-blocking...
fcntl.fcntl(c2s, fcntl.F_SETFL, os.O_NONBLOCK)
to_srv_idx = 0
while True:
socket_list = [c2s]
# Get the list sockets which can: take input, output, etc...
recv_ready_list, send_ready_list, exception_ready = select.select(
socket_list, socket_list, [])
for sock_fd in recv_ready_list:
assert sock_fd is c2s, "Strange socket failure here"
#incoming message from remote server
try:
from_srv_str = sock_fd.recv(4096)
except socket_module.error, e:
## https://stackoverflow.com/a/16745561/667301
err_continue = handle_socket_error_continue(e)
if err_continue is True:
continue
else:
if len(from_srv_str)==0:
print "SERVER CLOSED NORMALLY"
sys.exit(0)
## NOTE: if we get this far, we successfully received from_srv_str.
## Anything caught above, is some kind of fail...
print "from_srv_str: {0}".format(from_srv_str)
for sock_fd in send_ready_list:
#incoming message from remote server
if sock_fd is c2s:
#to_srv_str = raw_input('Send to server: ')
try:
to_srv_str = 'client->server {0}'.format(to_srv_idx)
sock_fd.send(to_srv_str)
##
time.sleep(1) ## Client blocks the server here... Why????
##
to_srv_idx += 1
except socket_module.error, e:
print "TO SERVER SEND ERROR", e
However, when I put a time.sleep(1) statement after the client does an fd.send() to the server, the TCP server code intermittently pauses while the client is sleeping.
AFAICT from running the provided code (nice self-contained example, btw), the server is behaving as intended.
In particular, the semantics of the select()
call are that select()
shouldn't return until there is something for the thread to do. Having the thread block inside select()
is a good thing when there is nothing that the thread can do right now anyway, since it prevents the thread from spinning the CPU for no reason.
So in this case, your server program has told select()
that it wants select()
to return only when at least one of the following conditions is true:
serv_sock
is ready-for-read (which is to say, a new client wants to connect to the server now)serv_sock
is ready-for-write (I don't believe this ever actually happens on a listening-socket, so this criterion can probably be ignored)clientsock
is ready-for-read (that is, the client has sent some bytes to the server and they are waiting in clientsock
's buffer for the server thread to recv()
them)clientsock
is ready-for-write (that is, clientsock
has some room in its outgoing-data-buffer that the server could send()
data into if it wants to send some data back to the client)select()
started blocking.I see (via print-debugging) that when your server program blocks, it is blocking inside select()
, which indicates that none of the 5 conditions above are being met during the blocking-period.
Why is that? Well, let's go down the list.
clientsock
(because the client program is sleeping, it's only reading the data coming from the server intermittently, and the TCP layer guarantees lossless/in-order transmission, so once clientsock
's outgoing-data-buffer is full, clientsock
won't select-as-ready-for-write unless/until the client reads at least some data from its end of the conenction)select()
started blocking.So is this behavior actually a problem for the server? In fact it is not, because the server will still be responsive to any other clients that connect to the server. In particular, select()
will still return right away whenever serv_sock
or any other client's socket select()
s as ready-for-read (or ready-for-write) and so the server can handle the other clients just fine while waiting for your hacked/slow client to wake up.
The hacked/slow client might be a problem for the user, but there's nothing the server can really do about that (short of forcibly disconnecting the client's TCP connection, or maybe printing out a log message requesting that someone debug the connected client program, I suppose :)).
I agree with EJP, btw -- selecting on ready-for-write should only be done on sockets that you actually want to write some data to. If you don't actually have any desire to write to the socket ASAP, then it's pointless and counterproductive to instruct select()
to return as soon as that socket is ready-for-write: the problem with doing so is that you're likely to spin the CPU a lot whenever any socket's outgoing-data-buffer is less-than-full (which in most applications, is most of the time!). The user-visible symptom of the problem would be that your server program is using up 100% of a CPU core even when it ought to be idle or mostly-idle.