I have my C library interfaced with swig. I can compile it with my setup.py. Here the extension section:
surf_int_lib = Extension("_surf_int_lib",
["surf_int_lib.i", "surf_int_lib.c"],
include_dirs=[numpy_include],
extra_compile_args=["-fopenmp"],
extra_link_args=['-lgomp'],
swig_opts=['-threads']
)
In my library I use openmp for parallelization. When I call my routines, I get the correct number of threads but they all suffer from GIL and are run concurrently. My routines give me the correct output. I was under the impression that swig -threads
would release GIL when entering the library. So why do my functions not parallelize?
Here is an example of an openmp routine:
void gegenbauerval(double *x, int nx, double *cs, int ncs, double alpha, double *f, int nf)
{
int j;
#pragma omp parallel for default(shared) private(j)
for(j=0;j<nx;++j){
f[j] = gegenbauerval_pt(x[j],cs,ncs, alpha);
}
}
My interface file does not include any %threads
or Py_BEGIN_ALLOW_THREADS
calls. Do I need to release GIL and if so, how would I do that?
Update: I have numpy with openblas installed in a virtualenv, which I use for my calculations. It is the exact same python interpreter as without virtualenv. If I run following onliner with activated environment, it is not parallelized. However, if I run it with the standard installation, it works. So I am no longer sure what the real error is.
python -c "import surf_int.lib.surf_int_lib as slib;import numpy as np;a=np.random.randn(1e8);c=np.random.rand(23);x=slib.gegenbauerval(a,c,1.5); print x"
After further investigation, I found that this is an issue between openmp and openblas (at least version 0.2.8).
After recompiling openblas 0.2.11 with option USE_OPENMP=1
, both blas routines from numpy as well as my own extensions using openmp make use of all cpus, set by the environment variable OMP_NUM_THREADS
.
The issue is maybe related to this bug report and or the changelog entry of openblas 0.2.9.rc2.