I’ve been working on an TCP/IP IOCP server application.
I’ve been testing performance (which seems in line with TCP throughput testing utilities) and now have been testing data integrity – this is where I am getting some “weirdness”.
As an initial test, I decided to have a test client send a 1MB block of data over and over where that block is just a sequence of integers incremented one after the other. The idea being that I can verify that each received buffer of data is consistent with no missing data in that buffer, independent from any other buffer received, meaning I don't need to worry what order threads handle the completed receives. (to verify, I extract the first integer in the buffer and scan forward, reset the expected value to 0 if I encounter the max value the client sends. I also have a check to make sure the data received in each is a multiple of 4 (as they are 4 byte integers)).
I seem to be getting random blocks of data missing occasionally from within a buffer, the values will go up 1 by 1, then there will be a bunch skipped. The code though seems quite simple and not much of it. I originally wrote the test in Delphi but after having these issues I rewrote a version in Visual Studio 2010 C++ and seem to be having the same issue (or at least very similar).
There is obviously more code in the real system, but I can boil it down to pretty much this in the worker thread, that just handles the completed receives, verifies the data in the buffer and then posts them again. After I initially accept the connection, I create two overlapped structures and allocate 1MB buffers to each and then call WSARecv for each of these. I've double checked I'm not accidentally sharing the same buffer between the two. Then, the following is pretty much what runs reusing these:
DWORD numberOfBytesTransferred = 0;
ULONG_PTR completionKey = NULL;
PMyOverlapped overlapped = nullptr;
while (true)
{
auto queueResult = GetQueuedCompletionStatus(iocp, &numberOfBytesTransferred, &completionKey, (LPOVERLAPPED *)&overlapped, INFINITE);
if (queueResult)
{
switch (overlapped->operation)
{
case tsoRecv:
{
verifyReceivedData(overlapped, numberOfBytesTransferred); // Checks the data is a sequence of incremented integers 1 after the other with no gabs
overlapped->overlapped = OVERLAPPED(); // Reset the OVERLAPPED structure to defaults
DWORD flags = 0;
numberOfBytesTransferred = 0;
auto returnCode = WSARecv(socket, &(overlapped->buffer), 1, &numberOfBytesTransferred, &flags, (LPWSAOVERLAPPED) overlapped, nullptr);
break;
}
default:;
}
}
}
Maybe I am not handling some kind of error or additional information in my simple test above? I originally had an IOCP client sending data but wrote another extremely simple one in Delphi using Indy blocking sockets. It's basically a line of code once connected.
while true do
begin
IdTCPClient.IOHandler.WriteDirect(TIdBytes(BigData), Length(BigData));
end;
I also wrote another server using a different asynchronous socket component and I haven't had it detect problems with the received data as my IOCP example above does, at least yet. I can post more code and possibly a version to compile but thought I would post the above in case I've missed something obvious. I think using one receive and one send per socket works ok, but it's my understanding that it's valid to post more than one to improve performance.
I believe this is solved - most of my assumptions and code were correct however it seems that for a particular socket, there cannot be simultaneous calls from multiple threads to WSASend or WSARead. There can be multiple outstanding calls for both sends and receives for a particular socket but the actual calls to initiate them need to serialized with a critical section (or similar). This was a slight misunderstanding of the MSDN documentation on my part, I was thinking it could be done but you wouldn't know which buffer would be filled first without some additional synchronization (and my test didn't care which got filled first). It appears it's simply not safe at all unless the calls are made one at a time and can result in corrupted data within the buffers.
The only code I have changed is to add a critical section per connection to protect calls to these and so far have had no issues. I think it might be possible to protect WSASend and WSARecv separately but have not tested that yet.
I had posted a far more in-depth question related to this here that has more code examples.