Search code examples
cudagputhrustpush-back

does thrust::device_vector.pushback() cause a call to memcpy?


Summary
I'd like some clarification on how the thrust::device_vector works.

AFAIK, writing to an indexed location such as device_vector[i] = 7 is implemented by the host, and therefore causes a call to memcpy. Does device_vector.push_back(7) also call memcpy?

Background

I'm working on a project comparing stock prices. The prices are stored in two vectors. I iterate over the two vectors, and when there's a change in their prices relative to each other, I write that change into a new vector. So I never know how long the resulting vector is going to be. On the CPU the natural way to do this is with push_back, but I don't want to use push_back on the GPU vector if its going to call memcpy every time.

Is there a more efficient way to build a vector piece by piece on the GPU?

Research
I've looked at this question, but it (and others) are focused on the most efficient way to access elements from the host. I want to build up a vector on the GPU.

Thank you.


Solution

  • Does device_vector.push_back(7) also call memcpy?

    No. It does, however, result in a kernel launch per call.

    Is there a more efficient way to build a vector piece by piece on the GPU?

    Yes.

    Build it (or large segments of it) in host memory first, then copy or insert to memory on the device in a single operation. You will greatly reduce latency and increase PCI-e bus utilization by doing so.