GNU Parallel and the GPU?

I am interested in getting GNU Parallel to run some numerical computation tasks on the GPU. Generically speaking, here is my initial approach:

Write the tasks to use OpenCL, or some other GPU interfacing library
Call GNU parallel on the task list (I am unsure about the need for this step)

This brought up the following questions:

Does my approach/use-case benefit from the use of GNU Parallel (i.e. should I even use it here)?
Does GNU Parallel offer a built-in mechanism for running tasks in parallel on a GPU?
If so, how can I configure GNU Parallel to do this?

Solution

Modern CPUs have multiple cores, that means they can run different instructions at the same time; so when core 1 is running a MUL core 2 may be running an ADD. This is also called MIMD - Multiple Instructions, Multiple Data.

GPUs, however, cannot run different instructions at the same time. They excel in running the same instruction on a large amounts of data; SIMD - Single Instruction, Multiple Data.

Modern GPUs have multiple cores that are each SIMD.

So where does GNU Parallel fit into this mix?

GNU Parallel starts programs. If your program uses a GPU and you have one single GPU core on your system, GNU Parallel will not make much sense. But if you have, say, 4 GPU cores on your system, then it makes sense to keep these 4 cores running at the same time. So if your program reads the variable CUDA_VISIBLE_DEVICES to decide which GPU core to run on, you can do something like this:

seq 10000 | parallel -j4 CUDA_VISIBLE_DEVICES='$(({%} - 1))' compute {}