c++gpu simulation porting maintainability

Risks of maintaining/sustaining two code sets, one for CPU one for GPU, that need to perform very similar functions

This is a bad title, but hopefully my description is clearer. I am managing a modeling and simulation application that is decades old. For the longest time we have been interested in writing some of the code to run on GPUs because we believe it will speed up the simulations (yes, we are very behind in the times). We finally have the opportunity to do this (i.e. money), and so now we want to make sure we understand the consequences of doing this, specifically to sustaining the code. The problem is that since many of our users do not have high end GPUs (at the moment), we would still need our code to support normal processing and GPU processing (i.e. I believe we will now have two sets of code performing very similar operations). Has anyone had to go through this and have any lesson learned and/or advice that they would like to share? If it helps, our current application is developed with C++ and we are looking at going with NVIDIA and writing in Cuda for the GPU.

Solution

This is similar to writing hand-crafted assembly version with vectorization or other assembly instructions, while maintaining a C/C++ version as well. There is a lot of experience with doing this in the long-term out there, and this advice is based on that. (My experience with doing this with GPU cases is both shorter term (a few years) and smaller (a few cases)).

You will want to write unit tests.

The unit tests use the CPU implementations (because I have yet to find a situation where they are not simpler) to test the GPU implementations.

The test runs a few simulations/models, and asserts that the results are identical if possible. These run nightly, and/or with every change to the code base as part of the acceptance suite.

This ensures that both code bases do not go "stale" as they are constantly exercised, and the two indepdendent implementations actually help with maintenance on the other.

Another approach is to run blended solutions. Sometimes running a mix of CPU and GPU is faster than one or the other, even if they are both solving the same problem.

When you have to switch technology (say, to a new GPU language, or to a distributed network of devices, or whatever new whiz-bang that shows up in the next 20 years), the "simpler" CPU implementation will be a life saver.