Is a thread-safe queue a good approach?

I am looking for a way to optimize the following code, for an open source project that I develop, or make it more performant by moving the heavy work to another thread.

void ProfilerCommunication::AddVisitPoint(ULONG uniqueId)
{
    CScopedLock<CMutex> lock(m_mutexResults);
    m_pVisitPoints->points[m_pVisitPoints->count].UniqueId = uniqueId;
    if (++m_pVisitPoints->count == VP_BUFFER_SIZE)
    {
        SendVisitPoints();
        m_pVisitPoints->count=0;
    } 
}

The above code is used by the OpenCover profiler (an open source code coverage tool for .NET written in C++) when each visit point is called. The mutex is used to protect some shared memory (a 64K block shared between several processes 32/64 bit and C++/C#) when full it signals the host process. Obviously this is quite heavy for each instrumentation point and I'd like to make the impact lighter.

I am thinking of using a queue which is pushed to by the above method and a thread to pop the data and populate the shared memory.

Q. Is there a thread-safe queue in C++ (Windows STL) that I can use - or a lock-less queue as I wouldn't want to replace one issue with another? Do people consider my approach sensible?

EDIT 1: I have just found concurrent_queue.h in the include folder - could this be my answer...?

Solution

Okay I'll add my own answer - concurrent_queue works very well

using the details described in this MSDN article I implemented concurrent queue (and tasks and my first C++ lambda expression :) ) I didn't spend long thinking though as it is a spike.

inline void AddVisitPoint(ULONG uniqueId) { m_queue.push(uniqueId); }

...
// somewhere else in code

m_tasks.run([this]
{
    ULONG id;
    while(true)
    {
         while (!m_queue.try_pop(id)) 
            Concurrency::Context::Yield();

        if (id==0) break; // 0 is an unused number so is used to close the thread/task
        CScopedLock<CMutex> lock(m_mutexResults);
        m_pVisitPoints->points[m_pVisitPoints->count].UniqueId = id;
        if (++m_pVisitPoints->count == VP_BUFFER_SIZE)
        {
            SendVisitPoints();
            m_pVisitPoints->count=0;
        }
    }
});

Results:

Application without instrumentation = 9.3
Application with old instrumentation handler = 38.6
Application with new instrumentation handler = 16.2