my colleagues and i consider buying a new server for deep learning with SXM2 NVlink etc. Because its power8 architecture i expect some difficulties building a usual stack on it eg. docker + tensorflow for deep learning frameworks. Has someone experience if the following setup will work or do i have to expect difficulties / impossibilities?
For the above described setup we figured out that it strongly depends on the use case. So here are the findings. Maybe it helps others that want to dive into this High Performance area and are unsure what architecture to buy.
Use case: Our use case is integration into existing architecture (SLURM) and cloud services (mostly x86 eg at aws). Therefore i spoke to nvidia and they recommended using nvlink (sxm2) on an x86. The PCIe will cover the standard socket to gpu communication. The SXM2 will transparently takeover the GPU grid communication. This will have the benefit that the training on the GPUs is blazing fast while deployment for x86 stays the same (GPUs are connect also over PCIe)
Power8 If one wants the full Power8 power the use case here would be true HPC levels from socket to GPU. This would than require more complexity in deployment. Once needs to decide on a use case level (eg high end research) if the power8 boost is needed.
Nvidia has a nice tech overview paper paper, explaining the stuff in more detail.