I have a big array, iterating and doing my work over it takes about 50ms. App i am developing will run on tegra3 or other fast cpu. I have divided my work for four threads, using pthread, i have taken width of my array, divided it by total core count found in system, and i am iterating for 1/fourth of array in each thread, everything is ok, but it now need 80ms to do the work. Any idea why multithread approach is slower than single thread? If i lower cpu count to 1 everything is back on 50ms.
for(int y = 0; y<height;y++)
{
for(int x = 0; x<width; x++)
{
int index = (y*width)+x;
int sourceIndex = source->getIndex(vertex_points[index].position[0]/ww, vertex_points[index].position[1]/hh);
vertex_points[index].position[0]+=source->x[sourceIndex]*ww;
vertex_points[index].position[1]+=source->y[sourceIndex]*hh;
}
};
i am dividing first for loop of above code into four parts based on cpu count. vertex_points is a vector with positions.
so it looks like
for(int y=start;y<end;y++)
and start/end vary on each thread
Thread startup time is typically on the order of milliseconds - that's what's eating your time.
With that in mind, 50 ms is not the kind of delay I'd worry about. If we were talking 5 seconds, that'd be a good candidate for paralellizing.
If the loop needs to be performed often, consider a solution with threads that are spun up early on and kept dormant, waiting for work to do. That'll run faster.
Also, is the CPU really 4-core? Honest cores or hyperthreading?