I have a large number of images (roughly 2500) that contain chessboards used for camera calibration. I'm launching several background threads to call cv::findChessboardCornersSB
to find the chessboard corners in these images. While choosing the optimal number of threads to launch, I noticed that quite a large amount of memory was left permanently allocated even after all analysis was complete.
After testing with different numbers of threads (1, 5, 10, 15, and 20), I've found that there is a definite correlation between the number of threads launched and the amount of memory left allocated when analysis is complete, and it's quite large. For example, launching 20 threads to analyze these 2500 chessboard images leaves 3760 MiB allocated after completion. Please see the graph below.
I know that my code isn't leaking, because I can continuously queue up analysis of more and more images indefinitely, and the memory consumption, while high, does not increase over time.
I'm hoping someone who knows OpenCV better than me can explain what's causing this, and hopefully whether or not I can do anything to free this memory.
As requested, here is a minimal reproducible example:
// Test controls
#define TEST_THREADS_COUNT (15)
#define TEST_IMAGES_COUNT 3000
// The structure that contains the data needed by the background thread to perform the chessboard corner finding
typedef struct {
uint64_t uID;
cv::Mat FrameImage;
int nRows; int nCols;
} ChessboardCornerFindingTaskThreadData;
// The vector of tasks waiting to be completed, and the mutex that protects access to it
std::vector<ChessboardCornerFindingTaskThreadData> *g_pWaitingTasksVector = NULL;
boost::mutex g_WaitingTasksVectorMutex;
// Whether or not the test is ready to start, and whether or not it's been completed
std::atomic<bool> g_bTestReadyToStart(false);
std::atomic<bool> g_bTestComplete(false);
// The function to run in each background thread
void* TestThreadFunc(void *pArg)
{
// Variables used in each loop iteration
bool bHasTask = false;
bool bLastTask = false;
bool bChessboardFound = false;
ChessboardCornerFindingTaskThreadData TaskToComplete;
// While we're busy analyzing frames
while (true)
{
// If the test is done
if (g_bTestComplete.load())
{
// Break out of this thread's loop
break;
}
// We have not yet determined that a task is available
bHasTask = false;
// If we're ready to start
if (g_bTestReadyToStart.load())
{
// Lock the mutex that protects access to the vector of tasks
g_WaitingTasksVectorMutex.lock();
// If a task is available
if ((NULL != g_pWaitingTasksVector) && (0 < g_pWaitingTasksVector->size()))
{
// Get a copy of the task to complete
TaskToComplete = (*g_pWaitingTasksVector)[0];
// Remove this one from the vector
g_pWaitingTasksVector->erase(g_pWaitingTasksVector->begin() + 0);
// Request minimal vector allocation
g_pWaitingTasksVector->shrink_to_fit();
// Set that we have a task for this loop iteration
bHasTask = true;
}
// Unlock the mutex that protects access to the vector of tasks
g_WaitingTasksVectorMutex.unlock();
}
// If we have a task to perform
if (bHasTask)
{
// Execute the chessboard corner finding
std::vector<cv::Point2f> FoundCorners;
bChessboardFound = cv::findChessboardCornersSB(TaskToComplete.FrameImage, cv::Size(TaskToComplete.nCols, TaskToComplete.nRows), FoundCorners, (cv::CALIB_CB_NORMALIZE_IMAGE + cv::CALIB_CB_EXHAUSTIVE));
// Release this frame's data
TaskToComplete.FrameImage.release();
// Get whether or not this is the last task
bLastTask = (TaskToComplete.uID >= TEST_IMAGES_COUNT);
}
// Or, if we don't have a task to perform right now
else
{
// Wait before checking again
usleep(1000);
}
// If this is the last task
if (bLastTask)
{
// Set that the test is complete so that other threads can return from their loops
g_bTestComplete.store(true);
// Lock the mutex that protects access to the vector of tasks
g_WaitingTasksVectorMutex.lock();
// If the vector is still allocated
if (NULL != g_pWaitingTasksVector)
{
// Delete it
delete g_pWaitingTasksVector;
g_pWaitingTasksVector = NULL;
}
// Unlock the mutex that protects access to the vector of tasks
g_WaitingTasksVectorMutex.unlock();
// Break out of this thread's loop
break;
}
}
// Nothing to return here
return NULL;
}
// Start the test
void StartTest()
{
// Create the vector of tasks that the background threads will complete
g_pWaitingTasksVector = new std::vector<ChessboardCornerFindingTaskThreadData>;
// We are not yet ready to start until all tasks have been queued up
g_bTestReadyToStart.store(false);
// The test is not complete
g_bTestComplete.store(false);
// For each thread to launch
for (int i = 0; i < TEST_THREADS_COUNT; ++i)
{
// Launch this background thread
pthread_t ThisThread;
pthread_create((&ThisThread), NULL, TestThreadFunc, NULL);
}
// An ID to assign to each task
uint64_t uID = 0;
// For each test image
for (int iTest = 0; iTest < TEST_IMAGES_COUNT; ++iTest)
{
// Increment the task ID
++uID;
// Fill in a data structure that the background thread will use to perform its analysis
ChessboardCornerFindingTaskThreadData ThisTask;
ThisTask.uID = uID;
ThisTask.FrameImage = cv::imread(std::string("/media/images/TestFrame.png"), cv::IMREAD_GRAYSCALE);
ThisTask.nRows = 50;
ThisTask.nCols = 17;
// Add this task to the vector
g_WaitingTasksVectorMutex.lock();
g_pWaitingTasksVector->push_back(ThisTask);
g_WaitingTasksVectorMutex.unlock();
}
// Flag the waiting background threads that they should begin analysis
g_bTestReadyToStart.store(true);
}
No, it doesn't allocate memory permanently.
See the source code which is quite transparent and shows usage only of dynamically allocated memory which is released when execution is done.
VmRSS (Virtual Memory Resident Set Size) tells how much of process memory (out of all VM memory) is now in physical memory. So, this is quite normal that if your process uses memory, it goes to physical memory and its size grows.
The VmRSS is inaccurate metric for measure used memory. See at the Linux manual page - proc_pid_status
VmRSS Resident set size. Note that the value here is the sum of RssAnon, RssFile, and RssShmem. This value is inaccurate; see /proc/pid/statm above.
You shouldn't rely on thee shrink_to_fit()
, as well in the:
// Request minimal vector allocation
g_pWaitingTasksVector->shrink_to_fit();
since per shrink_to_fit
:
It is a non-binding request to reduce capacity() to size(). It depends on the implementation whether the request is fulfilled.
And as a note, your reasoning about absence of leaks is incorrect. Leaks could be "conditional", in your case the leak could take place once per thread (when you start it, uses first time or cyclic, when you keep something leak-ready till the end of the loop, like keeping every time updated pointer to memory which never will be realeased).