I'm interested in optimizing my code for multithreaded computing. In terms of the cache, pipelining, or any other aspects of memory access, how do the following compare for conserving those resources:
Case 1
struct something{
float a;
float b;
int c;
bool d;
};
vector <something> vec(n, something());
for(int q=0; q<n; q++)
{
vec[q].a = expression1;
vec[q].b = expression2;
vec[q].c = expression3;
vec[q].d = expression4;
}
Case 2
struct something{
float a;
float b;
int c;
bool d;
};
vector <something> vec(n, something());
for(int q=0; q<n; q++)
vec[q].a = expression1;
for(int q=0; q<n; q++)
vec[q].b = expression2;
for(int q=0; q<n; q++)
vec[q].c = expression3;
for(int q=0; q<n; q++)
vec[q].d = expression4;
Case 3
vector <float> a(n);
vector <float> b(n);
vector <int> c(n);
vector <bool> d(n);
for(int q=0; q<n; q++)
a[q] = expression1;
for(int q=0; q<n; q++)
b[q] = expression2;
for(int q=0; q<n; q++)
c[q] = expression3;
for(int q=0; q<n; q++)
d[q] = expression4;
Also, are there better ways of approaching the above?
If all the struct fields are different, then case 3 has a huge advantage of possibly being vectorizable while case 1 doesn't.
The reason for this is because case 3 is the struct of arrays packing that puts all the same datatypes together sequentially in memory - thereby exposing vectorization.
EDIT :
*Case 3 is potentially even more cache friendly than case 1 because it doesn't need struct-padding - so the data size is smaller.