I have some code that processes images. Performance is critical so I'm tyring to implement multi-threading using BoundedBuffer
. The image data is stored as unsigned char*
(dictated by the SDK I'm using to process the image data).
The problem occurs in the processData
function called in the consumer thread. Inside the processData
, there is another function (from the image processing SDK) that uses cudaMemcpy2D
function. The cuda function always throws an exception saying Access violation reading location.
However, the cuda function works fine if I call the the processData
directly within the producer thread or deposit
. When I call processData
from the consumer thread (as desired), I get the exception from the cuda function. I even tried calling processData
from fetch and I got the same exception.
My guess is that after the data is deposit
ed into the rawImageBuffer
by the producer thread, somehow the memory pointed to by unsigned char*
changes, thus the consumer thread (or fetch
) actually sends bad image data to processData
(and the cuda function).
This is what my code looks like:
void processData(vector<unsigned char*> unProcessedData)
{
// Process the data
}
struct BoundedBuffer {
queue<vector<unsigned char*>> buffer;
int capacity;
std::mutex lock;
std::condition_variable not_full;
std::condition_variable not_empty;
BoundedBuffer(int capacity) : capacity(capacity) {}
void deposit(vector<unsigned char*> vData)
{
std::unique_lock<std::mutex> l(lock);
bool bWait = not_full.wait_for(l, 3000ms, [this] {return buffer.size() != capacity; }); // Wait if full
if (bWait)
{
buffer.push(vData); // only push data when timeout doesn't expire
not_empty.notify_one();
}
}
vector<unsigned char*> fetch()
{
std::unique_lock<std::mutex> l(lock);
not_empty.wait(l, [this]() {return buffer.size() != 0; }); // Wait if empty
vector<unsigned char*> result{};
result = buffer.front();
buffer.pop();
not_full.notify_one();
return result;
}
};
void producerTask(BoundedBuffer &rawImageBuffer)
{
for(;;)
{
// Produce Data
vector<unsigned char*> producedDataVec{dataElement0, dataElement1};
rawImageBuffer.deposit(producedDataVec);
} //loop breaks upon user interception
}
void consumerTask(BoundedBuffer &rawImageBuffer)
{
for(;;)
{
vector<unsigned char*> fetchedDataVec{};
fetchedDataVec = rawImageBuffer.fetch();
processData(fetchedDataVec);
} //loop breaks upon user interception
}
int main()
{
BoundedBuffer rawImageBuffer(6);
thread consumer(consumerTask, ref(rawImageBuffer));
thread producer(producerTask, ref(rawImageBuffer),
consumer.join();
producer.join();
return 0;
}
Am I correct in my guess about why the exception is being thrown? How do I resolve this? For reference, each vector element contains data for a 2448px X 2048px image in RGBa 8bit format.
UPDATES:
After someone pointed out in the comments that the unsigned char*
pointers could be invalid, I found that the address pointed by the pointers is in fact a real memory location. In the exception Access violation reading location X. X is larger than the location pointed by the pointer.
After some more debugging, I've found that the memory pointed to by the unsigned char*
in unprocessedData
vector in processData
doesn't remain intact, the pointer address is correct, but some blocks of memory are unreadable. I found this by printing each char
in the unsigned char*
in processData
. When processData
is called by producer thread (this is when cuda doesn't throw exception), all char
s get printed nicely (I'm printing 2048*2448*4 char
s, dictated by the aforementioned image resolution and format). But when processData
is called by the consumer thread, printing the char
throws the same exception, exception is thrown around the 40th char
(around 40th, not always 40th).
Okay, so now I'm pretty sure not only my pointers are pointing to real memory locations, I also know that the first memory block pointed by the pointer holds the expected value for as many times as I've tested this. To test this, in producerTask
I deliberately write a test value (such as int
42, or char
*) to the 0
th memory block pointed by the unsigned char*
. In the processData
function, I check if the memory block still contains the test value and it does. So, now I know some of the memory blocks pointed by the pointer become inaccessible to read for some unknown reason. Also, my test doesn't prove that the first memory block is immune to become inaccessible, just that it didn't become inaccessible for the few number of tests I did. TLDR for Updates 1 to 3: The unprocessedImage
pointers are valid, they point to a real memory address and also they point to the memory address that hold the expected value.
Another debugging attempt. Now I'm using Visual Studio's memory window to visually inspect the data. The debugger tells me that unProcessedData[0]
points to 0x00000279d7c76070
. This is what memory around 0x00000279d7c76070
looks like:
Memory seems sensible, the
RGBa
format can be clearly seen, the image is all black so it makes sense that the RGB
channels are close to 0 whereas alpha
is ff
. I scrolled down for a long time to see what the memory looks like, all the way till 0x00000279D8F9606F
the data looks good (RGBa values as expected). The 0x00000279D8F9606F
number also makes sense because 0x00000279D8F9606F
- 0x00000279d7c76070
= 0d20054015
, which means there are 20054016 valid char
s which is expected (2048 height*2448 width*4 channels = 20054016). Okay, so far so good. Note that all this is right before running the cuda function. After stepping through the cuda function I get the same exception: Access violation reading location 0x00000279D80B8000
. Note that 0x00000279D80B8000
is between 0x00000279d7c76070
and 0x00000279D8F9606F
, the parts of memory which I visually checked to be correct. Now, after running the cuda function here is what the memory between 0x00000279d7c76070
and 0x00000279D8F9606F
looks like:
cout
anything in processData
before calling the cuda function. The memory pointed by the pointer changes. All the char
s become equivalent to 0xdd
as can be seen in the image below. This page on MSDN says that The freed blocks kept unused in the debug heap's linked list when the _CRTDBG_DELAY_FREE_MEM_DF flag is set are currently filled with 0xDD.
processData
from the producer thread, the pointed memory doesn't change after I cout
anything.Right now the most upvoted comment to this question is telling me to learn more about pointers. I am doing this currently (hopefully as my updates may suggest), however what topics do I need to learn about them? I do know how pointers work. I know my the pointers are pointing to valid memory location (see Update 2). I know some memory blocks pointed by the pointer become inaccessible to read (see Update 3). But I don't know why the memory blocks become inaccessible. Especially, I don't know why they only become inaccessible when processData
is called from the consumer thread (note that there is no exception thrown when processData
is called form the producer thread). Is there anything else I can do to help narrow down this problem?
The problem was fairly simple, n.m.'s comments guided me towards the right direction and I'm thankful for that.
In my updates I mentioned that printing anything using cout
caused the data to become corrupt. Although, it seemed like that was happening, but after putting some breakpoints in fetch
and deposit
, I got a complete picture of what was really happening.
The way I produced the image data was by using another SDK supplied with the camera, the SDK provided me with image data in the type of wrapped pointer. Then I converted the image format and then unwrapped converted image to get the pointer to the raw image. Then the pointer to the raw image is stored into producedDataVec
and deposit
ed it into rawImageBuffer
. The problem was that as soon as the converted image went out of scope, my data became corrupted. So, the cout
statements weren't really responsible for corrupting my data. With breakpoints placed everywhere I could see the data becoming corrupt just after the converted image went out of scope. To resolve this, now my producer directly deposit
s the wrapped pointer to the buffer. The consumer fetch
es the wrapped pointer, the converted image is obtained by converting the format in the consumer, and then the raw image pointer is obtained. Now the converted image only goes out of scope after processData
has returned so the exception is never thrown.