fortran intel compiler-optimization intel-fortran

Intel compiler flags for single core usage

I noticed what seems to me a surprising behaviour with a fortran code mostly containing matrix/matrix and matrix/vector multiplications.

Initially, the code was compiled with gfortran and multiplications were carried out with double "DO" loops on lines and columns of matrices. I compiled the code using:

gfortran -c -g -O3 ...

Execution of the code was using a single core of an 8-core i7 processor.

I then compiled my code with the intel compiler using:

ifort -c -g -O3 ...

The code ran significantly faster still using a single core. I then decided to optimize the code using the well-known dgemm and dgemv functions respectively for matrix/matrix and matrix/vector multiplications.

I then compiled using:

ifort -c -g -O3 ...

The resulting code is working properly, but uses 8 cores of my i7-processor without any significant performance improvement. Is there a way to control the number of cores used by my code from the compilation command ?

Solution

The compiler itself is not generating any parallel code. But the Intel Math Kernel Library (MKL) (where DGEMM and friends live) does automatic parallelization and CPU dispatch.

The MKL documentation says this:

Use the following techniques to specify the number of OpenMP threads to use in Intel MKL:

Set one of the OpenMP or Intel MKL environment variables: OMP_NUM_THREADS MKL_NUM_THREADS MKL_DOMAIN_NUM_THREADS

Call one of the OpenMP or Intel MKL functions: omp_set_num_threads() mkl_set_num_threads() mkl_domain_set_num_threads() mkl_set_num_threads_local()