The requirement is: Items to deal with are stored in a global queue. Several handler threads get item from global queue to handle. Producer thread adds item to global queue continuously and rapidly(much faster than all dealer threads' processing speed. Also, handler thread is compute-intensive. The best performance is CPU used completely). So, I use one more countKeeping thread to keep the queue's length to a specific range, just like from BOTTOM to TOP roughly(just to keep the memory from used too much).
I use ManualResetEvent
to deal with 'can add to queue' status change. Global queue is
Queue<Object> mQueue = new Queue<Object>;
ManualResetEvent waitingKeeper = new ManualResetEvent(false);
Handler thread is
void Handle()
{
while(true)
{
Object item;
lock(mQueue)
{
if(mQueue.Count > 0)
item = mQueue.Dequeue();
}
// deal with item, compute-intensive
}
}
Producer thread will call AddToQueue() function to add item to mQueue.
void AddToQueue(Object item)
{
waitingKeeper.WaitOne();
lock(mQueue)
{
mQueue.Enqueue(item);
}
}
countKeeping thread is mainly like following
void KeepQueueingCount()
{
while(true)
{
// does not use 'lock(mQueue)'
// because I don't need that specific count of the queue
// I just need the queue does not cost too much memory
if(mQueue.Count < BOTTOM)
waitingKeeper.Set();
else if(mQueue.Count > TOP)
waitingKeeper.Reset();
Thread.Sleep(1000);
}
}
Here problom comes.
When I set the BOTTOM and TOP to smaller number, like BOTTOM = 20, TOP = 100, it works well with quad core CPU(CPU utilization is high), but with single CPU it works not that well(CPU utilization fluctuates relatively great. ).
When I set the BOTTOM and TOP to larger number, like BOTTOM = 100, TOP = 300, it works well with single CPU, but not that well with quad core CPU.
Both environment, both condition, the memory is not used too much(around 50M at most).
Logically, larger BOTTOM and TOP will help with the performance(when memory is not used too much), bacause there are more items for handler threads to handle. But the fact seems not like this.
I tried several ways to find the cause of the problem. And I just found, when I use lock(mQueue)
in keeping thread, it works both well in two above CPU conditions.
New countKeeping thread is mainly like this
void KeepQueueingCount()
{
bool canAdd = false;
while(true)
{
lock(mQueue)
{
if(mQueue.Count < BOTTOM)
canAdd = true;
else if(mQueue.Count > TOP)
canAdd = false;
}
if(canAdd)
waitingKeeper.Set();
else
waitingKeeper.Reset();
// I also did some change here
// when the 'can add' status changed, will sleep longer
// if not 'can add' status not changed, will sleep lesser
// but this is not the main reason
Thread.Sleep(1000);
}
}
So my questions are
lock
in countKeeping thread, why the range of
global queue affect performance(here, performance is mainly CPU
utilization) in different CPU conditions?lock
in countKeeping thread, the performance is both
well in different conditions. What does lock
do really affect
this?ManualResetEvent
?---UPDATE---
Producer thread's main part is as following. STEP is the count of items for each query from database. Querying is successively and in order until all items queryed.
void Produce()
{
while(true)
{
// query STEP items from database
itemList = QuerySTEPFromDB();
if(itemList.Count == 0)
// stop all handler thread
// here, will wait for handler threads handle all items in queue
// then, all handler threads exit
else
foreach(var item in itemList)
AddToQueue(item);
}
}
Your concurrent queue example is a classic example of where the atomic compare and swap spinlock tends to do considerably better, given very high contention but very little time spent in a lock (just the time to queue and dequeue).
https://msdn.microsoft.com/en-us/library/dd460716%28v=vs.110%29.aspx
It's also worth noting that .NET already has a concurrent queue provided for you which uses that kind of atomic CAS spinlock design.
Higher-level locks get very expensive if you have a very highly-contended shared resource that is only used for a brief amount of time.
If I use a crude visual analogy (with exaggerated, human-level time units), imagine you have a store and there's a line. But the clerks are really fast at their job, the line moves every second.
If you use a critical section/mutex here, it's like every customer dozes off and takes a nap when they find it's not their turn yet. Then when it's their turn, someone has to wake them up: -"Hey you, it's your turn now! Wake up!" -"Wha--huh? Oh okay." As you can imagine, because of the additional time blocking/suspending threads, you can also tend to form bigger and bigger lines with threads waiting for their turn.
This is also why you're seeing fluctuations in CPU utilization. The threads can form a traffic jam around the lock and get suspended/put to sleep, and that takes away from CPU utilization while they're sleeping and waiting for their turn. This is also rather indeterministic, as multithreading does not necessarily execute code in a perfectly pre-defined order, so you can see spikes if your design can allow threads to form a traffic jam around locks. You might get lucky in one session and get fast performance in such time-sensitive cases, and then get unlucky in another and get very poor performance. In worst case scenarios, you can actually get CPU utilization below that of single-threading with excessive locks (worked in a codebase once where the devs had a habit of putting mutexes around everything and I was often looking at 10% CPU utilization in performance-critical areas on a 2-core machine -- this was in a legacy codebase where the devs tried to multithread more as an afterthought, thinking it's okay to just sprinkle locks everywhere, instead of designing the code for multithreading).
If you use a low-level spinlock here, it's like the customers don't doze off when they find there's a line. They just stand around waiting very impatiently and constantly looking to see if it's their turn. If the line is moving really fast, then this can work much better.
Your problem is somewhat unusual in that your producers produce much faster than the consumers consume. The upper limit idea on how much can be produced at once seems sensible here, but you can kind of throttle the processing that way. I'm also not sure why you solve that in a separate count keeper thread (didn't quite understand that part). I think you can just make it so your producers don't enqueue items if some upper limit is reached until the queue gets smaller.
You might want to keep that upper limit to avoid spamming memory, but I think you'll do better (after using the appropriate lock) to sleep or yield the producers to balance the processing distribution and skew it more towards your consumers whenever a producer enqueues an item to process. This way you don't end up jamming up your producers when they reach that limit -- instead the point is to avoid reaching that limit so that your consumers have a chance to consume at a rate that doesn't lag significantly behind the producers.