Thread IDs with PPL and Parallel Memory Allocation

I have a question about the Microsoft PPL library, and parallel programming in general. I am using FFTW to perform a large set (100,000) of 64 x 64 x 64 FFTs and inverse FFTs. In my current implementation, I use a parallel for loop and allocate the storage arrays within the loop. I have noticed that my CPU usage only tops out at about 60-70% in these cases. (Note this is still better utilization than the built in threaded FFTs provided by FFTW which I have tested). Since I am using fftw_malloc, is it possible that excessive locking is occurring which is preventing full usage?

In light of this, is it advisable to preallocate the storage arrays for each thread before the main processing loop, so no locks are required within the loop itself? And if so, how is this possible with the MSFT PPL library? I have been using OpenMP before, in that case it is simple enough to get a thread ID using supplied functions. I have not however seen a similar function in the PPL documentation.

Solution

I am just answering this because nobody has posted anything yet.

Mutex(e)s can wreak havoc on performance if heavy locking is required. In addition if a lot of memory (re)-allocation is needed, that can also decrease performance and limit it to your memory bandwidth. Like you said a preallocation which later threads operate on can be usefull. However this requires that you have a fixed threadcount and that you spread your workload balanced on all threads.

Concerning the PPL thread_id functions, I can only speak about Intel-TBB, which however should be pretty similiar to PPL. TBB - and I suppose also PPL - is not speaking of threads directly, instead they are talking about tasks, the aim of TBB was to abstract these underlaying details away from the user, thus it does not provide a thread_id function.