How to use GPUDirect RDMA with Infiniband

I have two machines. There are multiple Tesla cards on each machine. There is also an InfiniBand card on each machine. I want to communicate between GPU cards on different machines through InfiniBand. Just point to point unicast would be fine. I surely want to use GPUDirect RDMA so I could spare myself of extra copy operations.

I am aware that there is a driver available now from Mellanox for its InfiniBand cards. But it doesn't offer a detailed development guide. Also I am aware that OpenMPI has support for the feature I am asking. But OpenMPI is too heavy weight for this simple task and it does not support multiple GPUs in a single process.

I wonder if I could get any help with directly using the driver to do the communication. Code sample, tutorial, anything would be good. Also, I would appreciate it if anyone could help me find the code dealing with this in OpenMPI.

Solution

For GPUDirect RDMA to work, you need the following installed:

Mellanox OFED installed (from http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers )
Recent NVIDIA CUDA suite installed
Mellanox-NVIDIA GPUDirect plugin (from the link you gave above - posting as guest prevents me from posting links :( )

All of the above should be installed (by the order listed above), and the relevant modules loaded. After that, you should be able to register memory allocated on the GPU video memory for RDMA transactions. Sample code will look like:

void * gpu_buffer;
struct ibv_mr *mr;
const int size = 64*1024;
cudaMalloc(&gpu_buffer,size); // TODO: Check errors
mr = ibv_reg_mr(pd,gpu_buffer,size,IBV_ACCESS_LOCAL_WRITE|IBV_ACCESS_REMOTE_WRITE|IBV_ACCESS_REMOTE_READ);

This will create (on a GPUDirect RDMA enabled system) a memory region, with a valid memory key that you can use for RDMA transactions with our HCA.

For more details about using RDMA and InfiniBand verbs in your code, you can refer to this document.