I am trying to figure out why my version of OpenMPI 1.6 does not work. I am using gcc-4.7.2 on CentOS 6.6. Given a toy program (i.e. hello.c)
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(int argc, char * argv[])
{
int taskID = -1;
int NTasks = -1;
/* MPI Initializations */
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &taskID);
MPI_Comm_size(MPI_COMM_WORLD, &NTasks);
printf("Hello World from Task %i\n", taskID);
MPI_Finalize();
return 0;
}
and compiling with mpicc hello.c
and running mpirun -np 8 ./a.out
, I get the errors :
--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:
Local host: qmaster02.cluster
Device name: mlx4_0
Device vendor ID: 0x02c9
Device vendor part ID: 4103
Default device parameters will be used, which may result in lower
performance. You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.
NOTE: You can turn off this warning by setting the MCA parameter
btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
Hello World from Task 4
Hello World from Task 7
Hello World from Task 5
Hello World from Task 0
Hello World from Task 2
Hello World from Task 3
Hello World from Task 6
Hello World from Task 1
[headnode.cluster:22557] 7 more processes have sent help message help-mpi-btl-openib.txt / no device params found
[headnode.cluster:22557] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
If I run this using mvapich2-2.1 and gcc-4.7.2, I just get Hello World from Task N
without any of these errors / warnings.
Looking at the libraries linked to a.out
, I get :
$ ldd a.out
linux-vdso.so.1 => (0x00007fff05ad2000)
libmpi.so.1 => /act/openmpi-1.6/gcc-4.7.2/lib/libmpi.so.1 (0x00002b0f8e196000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003954800000)
libm.so.6 => /lib64/libm.so.6 (0x0000003955400000)
librt.so.1 => /lib64/librt.so.1 (0x0000003955c00000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003965000000)
libutil.so.1 => /lib64/libutil.so.1 (0x0000003964c00000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003955000000)
libc.so.6 => /lib64/libc.so.6 (0x0000003954c00000)
/lib64/ld-linux-x86-64.so.2 (0x0000003954400000)
If I recompile it with mvapich2,
$ ldd a.out
linux-vdso.so.1 => (0x00007fffcdbcb000)
libmpi.so.12 => /act/mvapich2-2.1/gcc-4.7.2/lib/libmpi.so.12 (0x00002af3be445000)
libc.so.6 => /lib64/libc.so.6 (0x0000003954c00000)
libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x000000395e800000)
libibmad.so.5 => /usr/lib64/libibmad.so.5 (0x0000003955400000)
librdmacm.so.1 => /usr/lib64/librdmacm.so.1 (0x0000003146400000)
libibumad.so.3 => /usr/lib64/libibumad.so.3 (0x0000003955800000)
libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x0000003956000000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003954800000)
librt.so.1 => /lib64/librt.so.1 (0x0000003955c00000)
libgfortran.so.3 => /act/gcc-4.7.2/lib64/libgfortran.so.3 (0x00002af3beaf6000)
libm.so.6 => /lib64/libm.so.6 (0x00002af3bee0a000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003955000000)
libgcc_s.so.1 => /act/gcc-4.7.2/lib64/libgcc_s.so.1 (0x00002af3bf08e000)
libquadmath.so.0 => /act/gcc-4.7.2/lib64/libquadmath.so.0 (0x00002af3bf2a4000)
/lib64/ld-linux-x86-64.so.2 (0x0000003954400000)
libz.so.1 => /lib64/libz.so.1 (0x00002af3bf4d9000)
libnl.so.1 => /lib64/libnl.so.1 (0x0000003958800000)
What is wrong here? Is this due to the infiniband library not being linked to in the openmpi case?
Open MPI 1.6 does not ship with device parameters for the Mellanox ConnectX HCA with part ID 4103 by default, which can be easily fixed. Locate the [Mellanox Hermon]
section in $PREFIX/share/openmpi/mca-btl-openib-device-params.ini
and append 4103
to the end of the part ID list:
[Mellanox Hermon]
vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3,0x119f
vendor_part_id = 25408,25418,25428,...<skipped>...,26488,4099,4103
use_eager_rdma = 1 ^^^^^
mtu = 2048
max_inline_data = 128
Replace $PREFIX
with the path to the Open MPI installation. In your case that would be /act/openmpi-1.6/gcc-4.7.2
.