Search code examples
cmultithreadingwinapicaching

Prevent cache destruction by using thread affinity


I'm writing a program for Windows using latest msvc and the winapi. In short, I'm trying to speed up the computation process on a block of raw data read in from a file (in this case, creating a application-specific data structure by parsing the block and then running over all of the data again to verify its integrity).

I am thinking of using each multiple core on the system to make this faster. My idea is to split up the data into equal sized chunks for each core and run the data processing on a thread dedicated for that chunk.

I want to make this as efficient as possible, so here is my current thought process for the optimization:

  1. Create a worker thread for each chunk to be processed asynchronously
  2. Set each thread's affinity to a core (1st question)
  3. Since other threads will be scheduled by the OS to run on those cores as well, my program's built up locality cache will be destroyed by other running program code
  4. Try to loop through every thread in the system & set affinity to exclude those cores being used (2nd question)
  5. Somehow try to do the same for system threads, maybe KeSetSystemAffinityThreadEx?

1st question: I am not sure if I need to do this or not. Is there a benefit if the OS will schedule those threads to run across multiple cores anyway, balancing the work and the loads?

2nd question: I'm not sure how to properly do this (it seems like really bad idea), but it was along the lines of (using a 4 core cpu as example): create 3 chunks and 3 worker threads; make all other threads use CPU0.


I feel like this is not the correct way to approach this, and looking through the threading API for windows I can't find much to have this much control over the system. What am I missing or not taking into account?


Solution

  • 1st question: I am not sure if I need to do this or not. Is there a benefit if the OS will schedule those threads to run across multiple cores anyway, balancing the work and the loads?

    Of course you don't need to do it. The system will schedule all your threads and they will get their work done if you do not set any affinity for them. That's by far the norm, in fact. And the system is very likely to schedule those threads on a variety of cores. You don't need thread affinity to get true concurrency.

    Moreover, that may be the best you can hope for anyway, because setting thread affinity requires elevated privileges. See also next.

    2nd question: I'm not sure how to properly [set affinity for all the other threads in the system] (it seems like really bad idea)

    It's not only a bad idea, but an unworkable one. Unless you're the system itself, you cannot set affinity for other users' threads (including system threads). And even if you could, that would not prevent the system afterward scheduling new threads on the cores you are trying to hog.

    What you would need to do is completely exclude a subset of cores from normal scheduling, and then set affinity for those. As far as I am aware, Windows does not have that capability. Especially not the capability to enable that dynamically, while the system is running.

    Since you pose the question I suppose you recognize that your whole plan depends on being able to give your threads exclusive access (or mostly so) to a subset of cores. If you can't get that then setting affinity will be worse for you than not. If you really want to pursue it, then you could consider increasing your threads' priority (which also requires privilege).

    I urge you, however, to save anything along these lines for a last resort, if you can't otherwise make your program fast enough.