Computing physics and displaying it with GPU only

So basically, I've learnt OpenCL recently and with this new found power I made a physics simulation about 10 times faster. The issue is, I'm only using 10% of my GPU. I'm assuming this is because I'm sending data back to the CPU/Ram before sending it back to the GPU so it can be displayed. Anyone got ideas on how to avoid this? I kinda want to use OpenCL for my graphics but something tells me that's a bad idea - for context, I've never used OpenGL. This is all in C++ btw. Here's a pseudocode example of what my code looks like:

void start()
{
    CreateKernel();
    SendDataToKernel();
}

void update()
{
    RunKernel();
    float x,y = ReadDataFromKernel();
    Draw(std::round(x), std::round(y));
}

Solution

If you only observe 10% GPU usage, the problem is not sending the frame buffer around.

I've done a similar thing, physical simulations on the GPU and real time rendering right in OpenCL, then send the bitmap to the CPU via PCIe and to the display via <Windows.h> SetBitmapBits, back over the GPU. This works very efficiently and at 100% GPU utilization, examples are here and here. You can do drawing to the display directly via OpenCL-OpenGL interoperability to make it a bit more efficient, but this is really not necessary and won't solve your problem.

The solutuion is to make 2 threads on the CPU:

Compute thread: this runs the physics computation in an endless loop without any delay and calls the GPU conpute kernel and compute_queue.finish(); in every iteration. This thread keeps the GPU at 100% load at all time.
Render thread: This runs the kernel for rendering the data, then copies the bitmap over and executes the drawing command to the screen. If this whole process takes less than 1/60 second, call Sleep for the remaining time to reduce load on the GPU and let it spend more time on the physics computation. To make these two threads independent of each other, you should also make two OpenCL command queues, one for the compute thread and one for the render thread.