Search code examples
pythonmpimpi4pyopenmdaopetsc

Unable to call PETSc/MPI-based external code in parallel OpenMDAO


I am writing an OpenMDAO problem that calls a group of external codes in a parallel group. One of these external codes is a PETSc-based fortran FEM code. I realize this is potentially problematic since OpenMDAO also utilizes PETSc. At the moment, I'm calling the external code in a component using python's subprocess.

If I run my OpenMDAO problem in serial (i.e. python2.7 omdao_problem.py), everything, including the external code, works just fine. When I try to run it in parallel, however (i.e. mpirun -np 4 python2.7 omdao_problem.py) then it works up until the subprocess call, at which point I get the error:

*** Process received signal ***
Signal: Segmentation fault: 11 (11)
Signal code: Address not mapped (1)
Failing at address: 0xe3c00
[ 0] 0   libsystem_platform.dylib            0x00007fff94cb652a _sigtramp + 26
[ 1] 0   libopen-pal.20.dylib                0x00000001031360c5 opal_timer_darwin_bias + 15469
 *** End of error message ***

I can't make much of this, but it seems reasonable to me that the problem would come from using an MPI-based python code to call another MPI-enabled code. I've tried using a non-mpi "hello world" executable in the external code's place and that can be called by the parallel OpenMDAO code without error. I do not need the external code to actually run in parallel, but I do need to use the PETSc solvers and such, hence the inherent reliance on MPI. (I guess I could consider having both an MPI-enabled and non-MPI-enabled build of PETSc laying around? Would prefer not to do that if possible as I can see that becoming a mess in a hurry.)

I found this discussion which appears to present a similar issue (and further states that using subprocess in an MPI code, as I'm doing, is a no-no). In that case, it looks like using MPI_Comm_spawn may be an option, even though it isn't intended for that use. Any idea if that would work in the context of OpenMDAO? Other avenues to pursue for getting this to work? Any thoughts or suggestions are greatly appreciated.


Solution

  • You don't need to call the external code as a sub-process. Wrap the fortran code in python using F2py and pass a comm object down into it. This docs example shows how to work with components that use a comm.

    You could use an MPI spawn if you want to. This approach has been done, but its far from ideal. You will be much more efficient if you can wrap the code in memory and let OpenMDAO pass you a comm.