I have an application that processes a lot of data.
When the working set exceeds the L2(L3) cache performance falls dramatically.
I want to fix part of that using prefetching of data.
I want to take advantage of the fact that multithreaded code that runs on a hyper-threaded CPU shares a core and and cache.
The first thread (A) is the worker thread.
The second thread (B) prefetches data.
If I can force both threads to execute on the same core I can have thread (B) run ahead and fetch data.
Here's how it would look in pseudo code.
procedure TWorkerThread.Execute;
begin
Node:= WalkTheDataTree.GetNode;
Dowork(Node.MyData);
SyncWithThreadB;
end;
procedure TFetchThread.Execute;
begin
WaitForThreadA;
Node:= WalkTheDataTree_5_nodes_Ahead_of_A.GetNode; //Prefetch data.
end;
Both threads execute in lockstep, with the worker thread running at full speed and the fetch thread waiting for a signal.
Is there a way to force two threads to run in the same core on a HyperThreaded CPU?
I'm using Delphi XE2.
P.S. I know how to detect if the CPU supports hyperthreading using the CPUID instruction.
You simply call SetThreadAffinityMask
passing the handle to the thread you wish to constrain, and the processor mask for the target processor. The thread's handle can be obtained using the Handle
property.
Of course, you have to understand how to get the two threads onto the same physical core. On a hyperthreaded machine, the first N/2 logical processors are the physical cores, and the second N/2 logical processors are their hyperthreaded counterparts. So if you have a quad core, that is 8 logical processors, you want to put your threads on logical processors 0 and 4, or 1 and 5, or 2 and 6, or 3 and 7.
As general advice, you should avoid setting hard affinity masks. Scheduling threads is hard and the system generally does it better than you because it can see all the threads. You can only see your threads. You may consider SetThreadIdealProcessor
as an alternative.