Search code examples
cudaservergpuvirtualization

Use NVIDIA K20 cards on virtual machines on the same server with different CUDA SDK versions


I work on bi-processor Debian Wheezy server with 4 Nvidia K20m cards. I actually use CUDA 5 with 304.54 driver and GCC 4.6.3 but I would like to update to Debian Jessie (GCC 4.9) and CUDA 7.5. I already evaluate CUDA 7.5 which give me different results than CUDA 5 because of used instructions by NVCC (e.g.: FMA instructions are not used at same places, see post).

The main goal is to get two different CUDA versions on this server to keep compatibility with older computations and to prepare future with CUDA new features.

I think there are two possibilities :

  • A VMWare ESXI or Citrix XenServer hypervisor which allow to create two virtual machines (Wheezy/SDK 5 and Jessie/SDK 7.5) connected to K20 cards in pass through mode. I can not view these video cards in their compatible hardware list but one NVidia driver release notes say they are pass through (320.78 release notes at page 11). Which driver I have to install at the hypervisor level ?
  • Install latest nvidia driver and use two NVidia docker containers with different Cuda SDK and Debian versions. Is it possible to run SDK 5 with latest driver ?

What do you think about these possibilities ? Do you have any idea ?

Thank you a lot.


Solution

  • I can't comment on the virtualisation suggestion, however, there is no problem in running the most recent release driver (so CUDA 7.5 at the time of writing) and using older toolkits with it.

    Each CUDA toolkit release and its components are fully versioned, so you cannot mix the CUDA runtime and other libraries (cuFFT, CUBLAS, etc) from different toolkit releases or your own code built with those. However, drivers and the driver API they expose are backwards compatible. So you can use the CUDA 7.5 driver and driver API with either the CUDA 5 or CUDA 7.5 runtime without difficulty. You cannot, however, run a newer runtime on an old driver. That will generate a runtime error. I have found the modules utility very useful for selecting between toolkit/runtime versions for development and testing. My current development box has every release between 4.2 and 7.5 installed, with the 7.5 driver.

    Note also that older toolchains require older host compilers and support libraries. So if you move to a more modern distribution, you will still need to devise a way to have a supported gcc installation for the older toolkit you want to use (see the release notes of your toolkits and this question for more details). Many distributions have built-in systems to manage multiple compiler versions, but it has been many years since I ran debian, so I am not sure about the specifics of debian alternative compiler version selection.