Search code examples
c++pointersmalloc

Does `std::vector`'s iterator constructor copy the data?


Inside a function call, I have a dynamically allocated array that I want to fill a vector with. The context here is that I know I can't return a pointer because it goes out of scope after the return.

My question is about the safety of calling free on the created pointer once the vector has been constructed. Does the vector take ownership of the pointer, and therefore responsibility for destruction, or simply copy it's data? My concern is that if I call free and the vector's underlying array is the same memory as the original, then I will destroy its contents early.

__host__ 
std::vector<float> mult(std::vector<float> x, float scalar) {

    int n = x.size();

    int n_threads = 256;
    int n_blocks = (int)ceil(n / n_threads);
    size_t bytes = n * sizeof(float);    

    float *d_x; // gpu "device" inputs
    float *d_y; // gpu "device" outputs
    float *h_y; // cpu "host" outputs
    
    h_y = (float *)malloc(n * sizeof(float));
    cudaMalloc(&d_x, bytes);
    cudaMalloc(&d_y, bytes);

    cudaMemcpy(d_x, x.data(), bytes, cudaMemcpyHostToDevice);
    mult<<<n_blocks, n_threads>>>(n, d_x, scalar, d_y);
    cudaMemcpy(h_y, d_y, bytes, cudaMemcpyDeviceToHost);

    cudaFree(d_x);
    cudaFree(d_y);

    std::vector<float> y(h_y, h_y + sizeof(h_y));

    free(h_y); <<<< CALL IN QUESTION
    return y;
}

The docs say:

(3) range constructor Constructs a container with as many elements as the range [first,last), with each element emplace-constructed from its corresponding element in that range, in the same order.

I googled "emplace constructed" but found no definition of this term, but I feel like this is the key to my question.


Solution

  • The bit that is interesting:

        // Step 1: Allocate memory.
        float *h_y; // cpu "host" outputs
        h_y = (float *)malloc(n * sizeof(float));
    
    
        // Step 2: Copy data from source into allocated memory
        cudaMemcpy(h_y, d_y, bytes, cudaMemcpyDeviceToHost);
    
        // Step 3: Copy data from allocated memory to vector
        //         With a bug.
        std::vector<float> y(h_y, h_y + sizeof(h_y));
    
        // Step 4: Free Allocated memory.
        free(h_y);
    

    Your iniital question? Does "Step 4: free(h_y)" mess with the vectors memory.

    Short Answer: No.
    Long Answer: Containers (like vector) manage their own memory. So it allocates room and copies the data into this memory. So you should be freeing this memory.

    BUT: You have a bug:

    std::vector<float> y(h_y, h_y + sizeof(h_y));
    // sizeof(h_y) is the size of the pointer (probably 8 bytes)
    // Not the size of the allocated memory.
    // So you have only copyied one or two floats into the vector.
    

    You probably wanted:

    std::vector<float> y(h_y, h_y + n);
    

    Emplace Constructed:

    This is just a fancy way of saying that we are going to avoid calling default construtor followed by copy constructor. Instead it will make sure the object in the vector are constructed "in-place" by calling the constructor passing a reference to the each item in the range.

    Since you have float there is no actual constructor so it is going to simply copy each value from the range directly into the vectors allocated memory.

    What you should be doing:

    Don't allocate a temporary buffer. Just make sure the vector is of sufficient size and just copy directly from CUDA into your vector.

        // Step 1: Allocate memory (in the vector)
        std::vector<float> y(n);
    
        // Step 2: Copy data from source into allocated memory
        cudaMemcpy(&y[0], d_y, bytes, cudaMemcpyDeviceToHost);
    
        // Note: &y[0] is the address of the first element in the vector.
        //             vectors are contiguous.