Search code examples
c#multithreadinglow-latency

should I use thread affinity for "latency-critical" threads?


In my HFT trading application I have several places where I receive data from network. In most cases this is just a thread that only receives and process data. Below is part of such processing:

    public Reciver(IPAddress mcastGroup, int mcastPort, IPAddress ipSource)
    {

        thread = new Thread(ReceiveData);

        s = new Socket(AddressFamily.InterNetwork, SocketType.Dgram, ProtocolType.Udp);
        s.ReceiveBufferSize = ReceiveBufferSize;

        var ipPort = new IPEndPoint(LISTEN_INTERFACE/* IPAddress.Any*/, mcastPort);
        s.Bind(ipPort);

        option = new byte[12];
        Buffer.BlockCopy(mcastGroup.GetAddressBytes(), 0, option, 0, 4);
        Buffer.BlockCopy(ipSource.GetAddressBytes(), 0, option, 4, 4);
        Buffer.BlockCopy(/*IPAddress.Any.GetAddressBytes()*/LISTEN_INTERFACE.GetAddressBytes(), 0, option, 8, 4);
    }

    public void ReceiveData()
    {
        byte[] byteIn = new byte[4096];
        while (needReceive)
        {
            if (IsConnected)
            {
                int count = 0;
                try
                {
                    count = s.Receive(byteIn);
                }
                catch (Exception e6)
                {
                    Console.WriteLine(e6.Message);
                    Log.Push(LogItemType.Error, e6.Message);
                    return;
                }
                if (count > 0)
                {
                    OnNewMessage(new NewMessageEventArgs(byteIn, count));
                }
            }
        }
    }

This thread works forever once created. I just wonder if I should configure this thread to run on certain core? As I need lowest latency I want to avoid context switch. As I want to avoid context switch I better to run the same thread on the same processor core, right?

Taking into account that i need lowest latency is that correct that:

  • It would be better to set "thread afinity" for the most part of the "long-running" threads?
  • It would be better to set "thread afinity" for the thread from my example above?

I rewriting above code to c++ right now to port to Linux later if this is important however I assume that my question is more about hardware than language or OS.


Solution

  • I think the algorithm that has as little latency as possible would be to pin your threads to one core and set them to realtime priority (or whatever is the highest one).

    This will cause the OS to evict any other thread which happens to use that core.

    Hopefully the CPU cache will still contain useful data when your thread gets scheduled there. For that reason I like the idea of pinning to a core.

    You should probably set your entire process to a high priority class and minimize other activity on your box. Also turn off unused hardware because it might generate interrupts. Fix your NIC's interrupts to a different CPU core (some better NICs can do that).