Search code examples
c++cwinsockwinsock2

Should a call to WSAResetEvent after WSAEnumNetworkEvents cause event to never be set again?


We have a thread which is reading off of a socket. We ran into an issue on a network with a little more latency that we are used to, where our read loop would seemingly stop getting notified of read events on the socket. Original code (some error checking removed):

HANDLE hEventSocket = WSACreateEvent();
WSAEventSelect(pIOParams->sock, hEventSocket, FD_READ | FD_CLOSE);
std::array<HANDLE, 2>   ahEvents;

// This is an event handle that can be called from another thread to 
// get this read thread to exit
ahEvents[0] = pIOParams->hEventStop; 
ahEvents[1] = hEventSocket;
while(pIOParams->bIsReading)
{
    // wait for stop or I/O events
    DWORD dwTimeout = 30000; // in ms
    dwWaitResult = WSAWaitForMultipleEvents(ahEvents.size(), ahEvents.data(), FALSE, dwTimeout, FALSE);
    if(dwWaitResult == WSA_WAIT_TIMEOUT)
    {
        CLogger::LogPrintf(LogLevel::LOG_DEBUG, "CSessionClient", "WSAWaitForMultipleEvents time out");
        continue;  
    }
    if(dwWaitResult == WAIT_OBJECT_0) // check to see if we were signaled to stop from another thread
    {
         break;
    }
    if(dwWaitResult == WAIT_OBJECT_0 +1)
    {
        // determine which I/O operation triggered event
        if (WSAEnumNetworkEvents(pIOParams->sock, hEventSocket, &NetworkEvents) != 0)
        {
            int err = WSAGetLastError();
            CLogger::LogPrintf(LogLevel::LOG_WARN, "CSessionClient", "WSAEnumNetworkEvents failed (%d)", err);
            break;
        }

        // HERE IS THE LINE WE REMOVED THAT SEEMED TO FIX THE PROBLEM
        WSAResetEvent(hEventSocket);

        // Handle events on socket
        if (NetworkEvents.lNetworkEvents & FD_READ)
        {
             // Do stuff to read from socket
        }
        if (NetworkEvents.lNetworkEvents & FD_CLOSE)
        {
             // Handle that the socket was closed
             break;
        }
    }

}

Here is the issue: With WSAResetEvent(hEventSocket); in the code, sometimes the program works and reads all of the data from the server, but sometimes, it seems to get stuck in a loop receiving WSA_WAIT_TIMEOUT, even though the server appears to have data queued up for it.

While the program is looping receiving WSA_WAIT_TIMEOUT, Process Hacker shows the socket connected in a normal state.

Now we know that WSAEnumNetworkEvents will reset hEventSocket, but it doesn't seem like the additional call to WSAResetEvent should hurt. It also doesn't make sense that it permanently messes up the signaling. I would expect that perhaps we wouldn't get notified of the last chunk of data to be read, as data could have been read in between the call to WSAEnumNetworkEvents and WSAResetEvent, but I would assume that once additional data came in on the socket, the hEventSocket would get raised.

The stranger part of this is that we have been running this code for years, and we're only now seeing this issue.

Any ideas why this would cause an issue?


Solution

  • Calling WSAResetEvent() manually introduces a race condition that can put your socket into a bad state.

    After WSAEnumNetworkEvents() is called, when new data arrives afterwards, or there is unread data left over from an earlier read, then the event is signaled, but ONLY if the socket is in the proper state to signal that event.

    If the event does get signaled before you call WSAResetEvent(), you lose that signal.

    Per the WSAEventSelect() documentation:

    Having successfully recorded the occurrence of the network event (by setting the corresponding bit in the internal network event record) and signaled the associated event object, no further actions are taken for that network event until the application makes the function call that implicitly reenables the setting of that network event and signaling of the associated event object.

    FD_READ The recv, recvfrom, WSARecv, WSARecvEx, or WSARecvFrom function.

    ...

    Any call to the reenabling routine, even one that fails, results in reenabling of recording and signaling for the relevant network event and event object.

    ...

    For FD_READ, FD_OOB, and FD_ACCEPT network events, network event recording and event object signaling are level-triggered. This means that if the reenabling routine is called and the relevant network condition is still valid after the call, the network event is recorded and the associated event object is set. This allows an application to be event-driven and not be concerned with the amount of data that arrives at any one time

    What that means is that if you manually reset the event after calling WSAEnumNetworkEvents(), the event will NOT be signaled again until AFTER you perform a read on the socket (which re-enables the signing of the event for read operations) AND new data arrives afterwards, or you didn't read all of the data that was available.

    By resetting the event manually, you lose the signal that allows WSAWaitForMultipleEvents() to tell you to call WSAEnumNetworkEvents() so it can then tell you to read from the socket. Without that read, the event will never be signaled again when data is waiting to be read. The only other condition you registered that can signal the event is a socket closure.

    Since WSAEnumNetworkEvents() already resets the event for you, DON'T reset the event manually!