Search code examples
pythonnumpyopenmpopenblas

swig with openmp and python, does swig -threads need extra GIL handling?


I have my C library interfaced with swig. I can compile it with my setup.py. Here the extension section:

surf_int_lib = Extension("_surf_int_lib",
                   ["surf_int_lib.i", "surf_int_lib.c"],
                   include_dirs=[numpy_include],
                   extra_compile_args=["-fopenmp"],
                   extra_link_args=['-lgomp'],
                   swig_opts=['-threads']
                   )

In my library I use openmp for parallelization. When I call my routines, I get the correct number of threads but they all suffer from GIL and are run concurrently. My routines give me the correct output. I was under the impression that swig -threads would release GIL when entering the library. So why do my functions not parallelize?

Here is an example of an openmp routine:

void gegenbauerval(double *x, int nx, double *cs, int ncs, double alpha, double *f, int nf)
{
    int j;

    #pragma omp parallel for default(shared) private(j)
    for(j=0;j<nx;++j){
        f[j] = gegenbauerval_pt(x[j],cs,ncs, alpha);
    }
}

My interface file does not include any %threads or Py_BEGIN_ALLOW_THREADS calls. Do I need to release GIL and if so, how would I do that?

Update: I have numpy with openblas installed in a virtualenv, which I use for my calculations. It is the exact same python interpreter as without virtualenv. If I run following onliner with activated environment, it is not parallelized. However, if I run it with the standard installation, it works. So I am no longer sure what the real error is.

python -c "import surf_int.lib.surf_int_lib as slib;import numpy as np;a=np.random.randn(1e8);c=np.random.rand(23);x=slib.gegenbauerval(a,c,1.5); print x"

Solution

  • After further investigation, I found that this is an issue between openmp and openblas (at least version 0.2.8).

    After recompiling openblas 0.2.11 with option USE_OPENMP=1, both blas routines from numpy as well as my own extensions using openmp make use of all cpus, set by the environment variable OMP_NUM_THREADS.

    The issue is maybe related to this bug report and or the changelog entry of openblas 0.2.9.rc2.