Search code examples
rdma

verbs: When does a send operation completes?


Using libibverbs, to do a send operation, these actions happen in this order:

  1. Receiver posts a receive request using ibv_post_receive()
  2. Sender posts a send request using ibv_post_send()
  3. A completion queue element IBV_WC_RECV get pushed on the queue of the receiver
  4. Receiver pops this element using ibv_poll_cq()
  5. A completion queue element IBV_WC_SEND get pushed on the queue of the sender
  6. Sender pops this element using ibv_poll_cq()

My question are the followings:

  1. Is the order correct and guaranteed? I asking myself about 3 and 5, wheter first the sender's completion queue is pushed or the receiver's.
  2. When is the memory buffer of the receiver updated? Is it between 2 and 3 or between 4 and 5?

Solution

  • You can't really put an exact order between things that happen on the sender and on the receiver, since the only way they causally interact is by sending network messages, with have a non-zero latency.

    Another way to think about what happens is:

    1. Receive posts receive request
    2. Sender posts send request
    3. Send request starts executing; sender adapter sends a packet with SEND opcode over the network
    4. SEND packet is received and receiver adapter starts executing read request
    5. Receiver adapter places SEND payload in receive buffer and sends an ACK packet back to sender (there are more complex things that can happen with ACK coalescing etc but the overall idea is the same).

    Now at this point there is no strict order required by the spec, since the ACK packet travels over the network independently of how the receiver adapter proceeds to the next step, but things that happen next:

    • Receiver adapter generates a receive completion and pushes it into the receiver completion queue
    • Sender adapter receives ACK packet and pushes a send completion onto the sender completion queue

    If the network latency is extremely low and the receiver adapter runs slowly for some reason, it is possible that by some external timebase the sender completion appears before the receive completion.

    The one guarantee that we can make is that the contents of the receive buffer are updated strictly before both the receive completion and send completion are pushed. But there are corner cases, for example if the network fails before delivering the ACK packet back to the sender, it's possible that no successful send completion is generated even though the receiver will see the buffer updated and the receive completion generated.