Search code examples
cudamachine-learninggpumulti-gpu

needs guidance on distributing data on multiple GPUs


I'm currently developing a machine learning toolkit for GPU clusters. I tested logistic regression classifier on multiple GPUs.

I'm using a Master-Worker approach , where a master CPU creates several POSIX threads and matrices are divided among GPUs.

But the problem I have is how to store large matrices which can't be stored on a single machine. Are there any libraries or approaches to share data among nodes?


Solution

  • I'm not sure how big are your matrices but you should check CUDA 4.0 that was released a couple of weeks ago. One of the main features is shared memory across multiple CUDA devices/GPUs