Search code examples
openmpi

Openmpi with CUDA istallation issue


While trying to install Openmpi with CUDA support I am getting some make file failures.

btl_uct_module.c: In function ‘mca_btl_uct_reg_mem’:
btl_uct_module.c:214:22: error: ‘UCT_MD_MEM_ACCESS_REMOTE_GET’ undeclared (first use in this function)
         uct_flags |= UCT_MD_MEM_ACCESS_REMOTE_GET;
                      ^
btl_uct_module.c:214:22: note: each undeclared identifier is reported only once for each function it appears in
btl_uct_module.c:217:22: error: ‘UCT_MD_MEM_ACCESS_REMOTE_PUT’ undeclared (first use in this function)
         uct_flags |= UCT_MD_MEM_ACCESS_REMOTE_PUT;
                      ^
btl_uct_module.c:220:22: error: ‘UCT_MD_MEM_ACCESS_REMOTE_ATOMIC’ undeclared (first use in this function)
         uct_flags |= UCT_MD_MEM_ACCESS_REMOTE_ATOMIC;
                      ^
btl_uct_module.c:225:21: error: ‘UCT_MD_MEM_ACCESS_ALL’ undeclared (first use in this function)
         uct_flags = UCT_MD_MEM_ACCESS_ALL;
                     ^
Makefile:1912: recipe for target 'btl_uct_module.lo' failed
make[2]: *** [btl_uct_module.lo] Error 1
make[2]: Leaving directory '/home/usama/install/openmpi-4.0.1/opal/mca/btl/uct'
Makefile:2375: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory '/home/usama/install/openmpi-4.0.1/opal'
Makefile:1893: recipe for target 'all-recursive' failed
make: *** [all-recursive] Error 1

I used the following command to configure and then install.

./configure --prefix=/home/$USER/.openmpi --with-cuda
make all install

I am using following configuration:

Ubuntu 16.04

Cuda 10.1

CuDNN 7.5

Openmpi 4.0.1

The weird thing is I tried to do the same installation on my local machine with which has Ubuntu 18.04 and it installed and works fine. Is it some compatibility issue? Any thoughts?


Solution

  • Turns out it was a compatibility issue after all. Using openmpi 3.1.4 resolved the problem.