I have Gigabytes of data (stored in messages, each message is about 500KB) in a cloud queue (Azure) and data keeps coming.
I need to do some processing on each message. I've decided to create 2 background workers, one to get the data into memory, and the other to process that data:
GetMessage(CloudQueue cloudQueue, LocalQueue localQueue)
{
lock (localQueue)
{
localQueue.Enqueue(cloudQueue.Dequeue());
}
}
ProcessMessage(LocalQueue localQueue)
{
lock (localQueue)
{
Process(localQueue.Dequeue());
}
}
The issue is that data never stops coming so I'll be spending ALOT of time on synchronizing the local queue. Is there a known pattern for this type of problem?
You don't need to hold the lock while you process
Item i;
lock (localQueue)
{
i = localQueue.Dequeue();
}
Process(i);
Hence there should be little contention. If necessary, reduce the frequency with which the Producer takes the lock for enqueuing by batching the insertions: rather than the queue hold individual items have it hold batches. You effectively reduce the number of locks by a factor which is the average batch size. You can have a simple model of batching, say, every 10 or by time or some combination of time and threshold.