GPU Programming, CUDA or OpenCL or?

What is the best way to do programming for GPU?

I know:

CUDA is very good, much developer support and very nice zo debug, but only on NVidia Hardware
OpenCL is very flexible, run on NVidia, AMD and Intel Hardware, run on Accellerators, GPU and CPU but as far as I know not supported anymore by NVidia.
Coriander (https://github.com/hughperkins/coriander) which converts CUDA to OpenCL
HIP https://github.com/ROCm-Developer-Tools/HIP is made by AMD to have a possibility to write in a way to convert to AMD and NVidia CUDA. It also can convert CUDA to HIP.

OpenCL would my prefered way, I want to be very flexible in hardware support. But if not longer supported by NVidia, it is a knockout. HIP sounds then best to me with different released files. But how will be the support of Intels soon coming hardware?

Are there any other options? Important is for me many supported hardeware, long term support, so that can be compiled in some years also and manufacture independant. Additional: Should be able to use more than obe compiler, on Linux and Windows supported.

Solution

Nvidia won't cancel OpenCL support anytime soon.

A newly emerging approach for portable code on GPU is SYCL. It enables higher level programming from a single source file that is then compiled twice, once for the CPU and once for GPU. The GPU part then runs on GPU via either OpenCL, CUDA or some other backend.

As of right now however, the best supported GPU framework across vendors is OpenCL 1.2, which is very well established at this point. CPU code (C/C++) and GPU code (OpenCL C) are clearly separated, which helps make clear which part runs where, and when data needs to be copied between CPU and GPU. OpenCL runs on 10 year old GPUs, on all of the latest and fastest data-center GPUs, on all gaming and workstation GPUs and even on CPUs if you need more memory. All vendors (Nvidia, AMD, Intel, Apple, ARM, ...) are supported. On Nvidia GPUs there is no performance/efficiency tradeoff at all compared to CUDA; it runs just as fast.

If you choose to start with OpenCL, have a look at this OpenCL-Wrapper. The native OpenCL C++ bindings are a bit cumbersome, and this lightweight wrapper simplifies learning and development a lot, while keeping functionality and full performance.

CUDA is a proprietary GPU language that only works on Nvidia GPUs. It offers no performance advantage over OpenCL/SYCL, but limits the software to run on Nvidia hardware only.

HIP is a proprietary GPU language, which is only supported on 7 very expensive AMD datacenter/workstation GPU models. Same performance as OpenCL/SYCL, but it limits the software to a subset of AMD hardware.

The porting tools are great if you already have a large code base, but performance could possibly suffer. My advice is to go for either one open framework (OpenCL/SYCL) and stay fully committed to it, rather than start with a proprietary language (CUDA/HIP) that only runs on Nvidia or AMD GPUs, and then generate poorly optimized ports to support the other hardware. Porting is a lot of trouble, and even more trouble is maintaining multiple variants of the same code in different proprietary languages. With OpenCL or SYCL you only need one implementation and it runs everywhere.