When I use sockets in blocking mode, I can have a simple system that does something like this:
client server
A -------------------> B
register
A <------------------> B
(various messages)
A -------------------> B
unregister
Just after the unregister
message is sent, the process A can quit immediately and yet B receives the message as expected.
If I turn on non-blocking mode on A's socket, B never receives unregister
if A send that message and then quits immediately (I tested by adding a sleep(1)
after sending unregister
, in that case it works as expected.) So, more or less, my client cannot cleanly unregister itself.
Note: when B poll()
A's socket, I get a Hanged Up signal (POLLHUP) instead of the last unregister
message, then the hang up.
I tried to add a call to turn blocking mode back on, and somehow it makes no difference. I use the following code to change the blocking mode:
int optval(0 or 1);
ioctl(get_socket(), FIONBIO, &optval);
Just in case, I tried with fcntl()
too, although I'm sure that tweaks the same flag as far as the kernel is concerned.
int flags(fcntl(get_socket(), F_GETFL));
flags |= O_NONBLOCK; // use this line to turn ON
flags &= ~O_NONBLOCK; // use this line to turn OFF
fcntl(get_socket(), F_SETFL, flags);
As a side note, I send and receive my messages using the read()
and write()
functions.
Update:
For those interested, the test is now in our git:
Server: https://sourceforge.net/p/snapcpp/code/ci/master/tree/snapwebsites/tests/test_shutdown_server.cpp
Client: https://sourceforge.net/p/snapcpp/code/ci/master/tree/snapwebsites/tests/test_shutdown_client.cpp
These use the snap library, mainly the snap_communicator which depends on tcp client/server:
tcp: https://sourceforge.net/p/snapcpp/code/ci/master/tree/snapwebsites/lib/tcp_client_server.cpp
communicator: https://sourceforge.net/p/snapcpp/code/ci/master/tree/snapwebsites/lib/snap_communicator.cpp
As you are discovering, send
on a socket only queues the data to be sent. It doesn't actually mean the server got it. This is true for blocking and non-blocking sockets.
Several possibilities:
Make sure you call close
on the socket before your client program exits. You didn't say in your question if this was happening, but it's probably a good idea.
If #1 doesn't work, use the SO_LINGER
option on the socket. Set a timeout interval appropriate.
Something like the following
struct linger ling;
ling.l_onoff = 1;
ling.l_linger = 3; // 3 second wait for data to finish being set.
setsockopt(s, SOL_SOCKET, SO_LINGER, &ling, sizeof(ling));
recv
will return 0 when the server closes the socket)My recommendation is to make sure you have implemented #1. If that doesn't it to it for you, evaluate #3. #2, if nothing else.