Search code examples
c++multithreadingpointersproducer-consumer

Function in consumer thread unable to access memory location


I have some code that processes images. Performance is critical so I'm tyring to implement multi-threading using BoundedBuffer. The image data is stored as unsigned char* (dictated by the SDK I'm using to process the image data).

The problem occurs in the processData function called in the consumer thread. Inside the processData, there is another function (from the image processing SDK) that uses cudaMemcpy2D function. The cuda function always throws an exception saying Access violation reading location.

However, the cuda function works fine if I call the the processData directly within the producer thread or deposit. When I call processData from the consumer thread (as desired), I get the exception from the cuda function. I even tried calling processData from fetch and I got the same exception.

My guess is that after the data is deposited into the rawImageBuffer by the producer thread, somehow the memory pointed to by unsigned char* changes, thus the consumer thread (or fetch) actually sends bad image data to processData (and the cuda function).

This is what my code looks like:

void processData(vector<unsigned char*> unProcessedData)
{
    // Process the data
}

struct BoundedBuffer {
    queue<vector<unsigned char*>> buffer;
    int capacity;

    std::mutex lock;

    std::condition_variable not_full;
    std::condition_variable not_empty;

    BoundedBuffer(int capacity) : capacity(capacity) {}

    void deposit(vector<unsigned char*> vData) 
    {
        std::unique_lock<std::mutex> l(lock);

        bool bWait = not_full.wait_for(l, 3000ms, [this] {return buffer.size() != capacity; }); // Wait if full

        if (bWait)
        {
            buffer.push(vData); // only push data when timeout doesn't expire
            not_empty.notify_one();
        }           
    }

    vector<unsigned char*> fetch()
    {
        std::unique_lock<std::mutex> l(lock);

        not_empty.wait(l, [this]() {return buffer.size() != 0; }); // Wait if empty

        vector<unsigned char*> result{};

        result = buffer.front();
        buffer.pop();

        not_full.notify_one();

        return result;
    }
};

void producerTask(BoundedBuffer &rawImageBuffer)
{
    for(;;)
    {
        // Produce Data
        vector<unsigned char*> producedDataVec{dataElement0, dataElement1};
        rawImageBuffer.deposit(producedDataVec);
    } //loop breaks upon user interception
}

void consumerTask(BoundedBuffer &rawImageBuffer)
{
    for(;;)
    {
        vector<unsigned char*> fetchedDataVec{};
        fetchedDataVec = rawImageBuffer.fetch();
        processData(fetchedDataVec);
    } //loop breaks upon user interception 
}

int main()
{
        BoundedBuffer rawImageBuffer(6);

        thread consumer(consumerTask, ref(rawImageBuffer));
        thread producer(producerTask, ref(rawImageBuffer), 

        consumer.join();
        producer.join();

        return 0;
}

Am I correct in my guess about why the exception is being thrown? How do I resolve this? For reference, each vector element contains data for a 2448px X 2048px image in RGBa 8bit format.

UPDATES:

  1. After someone pointed out in the comments that the unsigned char* pointers could be invalid, I found that the address pointed by the pointers is in fact a real memory location. In the exception Access violation reading location X. X is larger than the location pointed by the pointer.

  2. After some more debugging, I've found that the memory pointed to by the unsigned char* in unprocessedData vector in processData doesn't remain intact, the pointer address is correct, but some blocks of memory are unreadable. I found this by printing each char in the unsigned char* in processData. When processData is called by producer thread (this is when cuda doesn't throw exception), all chars get printed nicely (I'm printing 2048*2448*4 chars, dictated by the aforementioned image resolution and format). But when processData is called by the consumer thread, printing the char throws the same exception, exception is thrown around the 40th char (around 40th, not always 40th).

  3. Okay, so now I'm pretty sure not only my pointers are pointing to real memory locations, I also know that the first memory block pointed by the pointer holds the expected value for as many times as I've tested this. To test this, in producerTask I deliberately write a test value (such as int 42, or char *) to the 0th memory block pointed by the unsigned char*. In the processData function, I check if the memory block still contains the test value and it does. So, now I know some of the memory blocks pointed by the pointer become inaccessible to read for some unknown reason. Also, my test doesn't prove that the first memory block is immune to become inaccessible, just that it didn't become inaccessible for the few number of tests I did. TLDR for Updates 1 to 3: The unprocessedImage pointers are valid, they point to a real memory address and also they point to the memory address that hold the expected value.

  4. Another debugging attempt. Now I'm using Visual Studio's memory window to visually inspect the data. The debugger tells me that unProcessedData[0] points to 0x00000279d7c76070. This is what memory around 0x00000279d7c76070 looks like: image_memory_before Memory seems sensible, the RGBa format can be clearly seen, the image is all black so it makes sense that the RGB channels are close to 0 whereas alpha is ff. I scrolled down for a long time to see what the memory looks like, all the way till 0x00000279D8F9606F the data looks good (RGBa values as expected). The 0x00000279D8F9606F number also makes sense because 0x00000279D8F9606F - 0x00000279d7c76070 = 0d20054015, which means there are 20054016 valid chars which is expected (2048 height*2448 width*4 channels = 20054016). Okay, so far so good. Note that all this is right before running the cuda function. After stepping through the cuda function I get the same exception: Access violation reading location 0x00000279D80B8000. Note that 0x00000279D80B8000 is between 0x00000279d7c76070 and 0x00000279D8F9606F, the parts of memory which I visually checked to be correct. Now, after running the cuda function here is what the memory between 0x00000279d7c76070 and 0x00000279D8F9606F looks like: image_memory_after

  5. When I cout anything in processData before calling the cuda function. The memory pointed by the pointer changes. All the chars become equivalent to 0xdd as can be seen in the image below. This page on MSDN says that The freed blocks kept unused in the debug heap's linked list when the _CRTDBG_DELAY_FREE_MEM_DF flag is set are currently filled with 0xDD. image_memory_change But when I call processData from the producer thread, the pointed memory doesn't change after I cout anything.

Right now the most upvoted comment to this question is telling me to learn more about pointers. I am doing this currently (hopefully as my updates may suggest), however what topics do I need to learn about them? I do know how pointers work. I know my the pointers are pointing to valid memory location (see Update 2). I know some memory blocks pointed by the pointer become inaccessible to read (see Update 3). But I don't know why the memory blocks become inaccessible. Especially, I don't know why they only become inaccessible when processData is called from the consumer thread (note that there is no exception thrown when processData is called form the producer thread). Is there anything else I can do to help narrow down this problem?


Solution

  • The problem was fairly simple, n.m.'s comments guided me towards the right direction and I'm thankful for that.

    In my updates I mentioned that printing anything using cout caused the data to become corrupt. Although, it seemed like that was happening, but after putting some breakpoints in fetch and deposit, I got a complete picture of what was really happening.

    The way I produced the image data was by using another SDK supplied with the camera, the SDK provided me with image data in the type of wrapped pointer. Then I converted the image format and then unwrapped converted image to get the pointer to the raw image. Then the pointer to the raw image is stored into producedDataVec and deposited it into rawImageBuffer. The problem was that as soon as the converted image went out of scope, my data became corrupted. So, the cout statements weren't really responsible for corrupting my data. With breakpoints placed everywhere I could see the data becoming corrupt just after the converted image went out of scope. To resolve this, now my producer directly deposits the wrapped pointer to the buffer. The consumer fetches the wrapped pointer, the converted image is obtained by converting the format in the consumer, and then the raw image pointer is obtained. Now the converted image only goes out of scope after processData has returned so the exception is never thrown.