Search code examples
iocp

Why are I/O Completion Port Packets Queued in FIFO order if they may be dequeued in a different order?


Microsoft's documentation for I/O Completion Ports states:

Please note that while the [completion] packets are queued in FIFO order they may be dequeued in a different order.

It is my understanding that a thread obtains a completion packet from a completion port by calling GetQueuedCompletionStatus. Why does the system queue packets to a completion port in a FIFO order if it does not guarantee packets will be retrieved in a FIFO order?


Solution

  • The statement that you quoted is intended to make you aware that you need to do your own sequencing if sequencing is required between I/O completions on a single socket. You might need this if you issue multiple WSARecv calls on a single socket. When they complete the completions will go into the IOCP queue in FIFO order and this will be the order that the WSARecv calls were issued.

    If you keep reading that document you will see this piece:

    Threads that block their execution on an I/O completion port are released in last-in-first-out (LIFO) order, and the next completion packet is pulled from the I/O completion port's FIFO queue for that thread. This means that, when a completion packet is released to a thread, the system releases the last (most recent) thread associated with that port, passing it the completion information for the oldest I/O completion.

    Which shows that the completions are removed from the IOCP in FIFO order. The reason for the first note is that if you have multiple threads waiting on an IOCP then thread scheduling issues may mean that your code processes the completions in a different order to the order that they were retrieved from the IOCP.

    Imagine you have 2 threads servicing an IOCP and a single TCP socket with 3 WSARecvs pending. Enough data comes in from the network to complete all three pending WSARecvs and so you end up with three completions in the IOCP; we'll call them A, B & C. These are in the order that the WSARecv calls were issued and so the data in the buffers A, B & C should be processed in order to maintain the sanity of the TCP stream.

    The first of your IOCP threads will be given completion A. The second thread will be given completion B. Depending on your hardware (number of cores, etc) and the OS scheduler either thread 1 or thread 2 may get to run next or both may run at the same time. This may cause you problems in the above situation.

    I personally get around this by adding a sequence number to every buffer when writing servers that can issue multiple WSARecvs on a single socket. The sequence number is incremented, inserted in the buffer and the WSARecv issued inside of the same lock so that the whole operation is atomic. When the completions occur I ensure either that only one thread processes buffers for a given socket (see here) or I use a 'sequenced buffer collection' which can ensure that the buffers are processed in the correct sequence (see here).

    Note also that to ensure correctness you need to lock around issuing WSARecv (and WSASend) calls on a given socket anyway (see here)