I am new with gem5 simulator. I have a C application that i want to make it run faster. So the first thing I've done is to optimize it using several techniques like loop unrolling and SIMD. And the next step, i intend to make it work on multiple cores (X86 and ARM) for that i must use the gem5 simulator.
The application is for Radix4 computing. For now I've succeeded to make it work on one core systems for X86 and ARM but, now i want to make it work on 4, 16, ... cores X86 or ARM.
could someone give me some hints or show me the right way to do this? Thank you
this is a global idea about the application
void init_twiddle(int N)
{
int i;
for(i=0; i<TWIDDLE_LIMIT; i++)
{
/*Filling the twiddle table*/
}
}
void init_LUT(int N)
{
LUT_n2 = malloc((1+PMAX)*sizeof(int*));
for (i=0; i <= PMAX; i++){
for (j=0; j < n; j++)
/*Calculate radix parametrs and put them in a table*/
}
}
void bit_r4_reorder(float* x, float* y, int N)
{
/*Bit reordering after calculating the radix4*/
}
void radix4(float *x,float *y, int N)
{
/*function for the radix4 computing*/
}
int main()
{
/*Calling the previous functions*/
}
The application doesn't know that it's being run on a simulated system, so you can treat gem5 as a real system to achieve your goal. i.e., by using OpenMP or MPI.
If the system being modeled has these libraries (OpenMP or MPI) installed then these libraries should work in theory.