c++multithreading opencv computer-vision camera-calibration

Is OpenCV permanently allocating memory for each thread that calls cv::findChessboardCornersSB?

I have a large number of images (roughly 2500) that contain chessboards used for camera calibration. I'm launching several background threads to call cv::findChessboardCornersSB to find the chessboard corners in these images. While choosing the optimal number of threads to launch, I noticed that quite a large amount of memory was left permanently allocated even after all analysis was complete.

After testing with different numbers of threads (1, 5, 10, 15, and 20), I've found that there is a definite correlation between the number of threads launched and the amount of memory left allocated when analysis is complete, and it's quite large. For example, launching 20 threads to analyze these 2500 chessboard images leaves 3760 MiB allocated after completion. Please see the graph below.

I know that my code isn't leaking, because I can continuously queue up analysis of more and more images indefinitely, and the memory consumption, while high, does not increase over time.

I'm hoping someone who knows OpenCV better than me can explain what's causing this, and hopefully whether or not I can do anything to free this memory.

As requested, here is a minimal reproducible example:

// Test controls
#define TEST_THREADS_COUNT (15)
#define TEST_IMAGES_COUNT 3000

// The structure that contains the data needed by the background thread to perform the chessboard corner finding
typedef struct {
    uint64_t uID;
    cv::Mat FrameImage;
    int nRows; int nCols;
} ChessboardCornerFindingTaskThreadData;

// The vector of tasks waiting to be completed, and the mutex that protects access to it
std::vector<ChessboardCornerFindingTaskThreadData> *g_pWaitingTasksVector = NULL;
boost::mutex g_WaitingTasksVectorMutex;

// Whether or not the test is ready to start, and whether or not it's been completed
std::atomic<bool> g_bTestReadyToStart(false);
std::atomic<bool> g_bTestComplete(false);

// The function to run in each background thread
void* TestThreadFunc(void *pArg)
{
    // Variables used in each loop iteration
    bool bHasTask = false;
    bool bLastTask = false;
    bool bChessboardFound = false;
    ChessboardCornerFindingTaskThreadData TaskToComplete;

    // While we're busy analyzing frames
    while (true)
    {
        // If the test is done
        if (g_bTestComplete.load())
        {
            // Break out of this thread's loop
            break;
        }

        // We have not yet determined that a task is available
        bHasTask = false;

        // If we're ready to start
        if (g_bTestReadyToStart.load())
        {
            // Lock the mutex that protects access to the vector of tasks
            g_WaitingTasksVectorMutex.lock();

            // If a task is available
            if ((NULL != g_pWaitingTasksVector) && (0 < g_pWaitingTasksVector->size()))
            {
                // Get a copy of the task to complete
                TaskToComplete = (*g_pWaitingTasksVector)[0];

                // Remove this one from the vector
                g_pWaitingTasksVector->erase(g_pWaitingTasksVector->begin() + 0);

                // Request minimal vector allocation
                g_pWaitingTasksVector->shrink_to_fit();

                // Set that we have a task for this loop iteration
                bHasTask = true;
            }

            // Unlock the mutex that protects access to the vector of tasks
            g_WaitingTasksVectorMutex.unlock();
        }

        // If we have a task to perform
        if (bHasTask)
        {
            // Execute the chessboard corner finding
            std::vector<cv::Point2f> FoundCorners;
            bChessboardFound = cv::findChessboardCornersSB(TaskToComplete.FrameImage, cv::Size(TaskToComplete.nCols, TaskToComplete.nRows), FoundCorners, (cv::CALIB_CB_NORMALIZE_IMAGE + cv::CALIB_CB_EXHAUSTIVE));
            
            // Release this frame's data
            TaskToComplete.FrameImage.release();

            // Get whether or not this is the last task
            bLastTask = (TaskToComplete.uID >= TEST_IMAGES_COUNT);
        }

        // Or, if we don't have a task to perform right now
        else
        {
            // Wait before checking again
            usleep(1000);
        }

        // If this is the last task
        if (bLastTask)
        {
            // Set that the test is complete so that other threads can return from their loops
            g_bTestComplete.store(true);

            // Lock the mutex that protects access to the vector of tasks
            g_WaitingTasksVectorMutex.lock();

            // If the vector is still allocated
            if (NULL != g_pWaitingTasksVector)
            {
                // Delete it
                delete g_pWaitingTasksVector;
                g_pWaitingTasksVector = NULL;
            }

            // Unlock the mutex that protects access to the vector of tasks
            g_WaitingTasksVectorMutex.unlock();

            // Break out of this thread's loop
            break;
        }
    }

    // Nothing to return here
    return NULL;
}


// Start the test
void StartTest()
{
    // Create the vector of tasks that the background threads will complete
    g_pWaitingTasksVector = new std::vector<ChessboardCornerFindingTaskThreadData>;

    // We are not yet ready to start until all tasks have been queued up
    g_bTestReadyToStart.store(false);

    // The test is not complete
    g_bTestComplete.store(false);
    
    // For each thread to launch
    for (int i = 0; i < TEST_THREADS_COUNT; ++i)
    {
        // Launch this background thread
        pthread_t ThisThread;
        pthread_create((&ThisThread), NULL, TestThreadFunc, NULL);
    }

    // An ID to assign to each task
    uint64_t uID = 0;

    // For each test image
    for (int iTest = 0; iTest < TEST_IMAGES_COUNT; ++iTest)
    {
        // Increment the task ID
        ++uID;

        // Fill in a data structure that the background thread will use to perform its analysis
        ChessboardCornerFindingTaskThreadData ThisTask;
        ThisTask.uID = uID;
        ThisTask.FrameImage = cv::imread(std::string("/media/images/TestFrame.png"), cv::IMREAD_GRAYSCALE);
        ThisTask.nRows = 50;
        ThisTask.nCols = 17;

        // Add this task to the vector
        g_WaitingTasksVectorMutex.lock();
        g_pWaitingTasksVector->push_back(ThisTask);
        g_WaitingTasksVectorMutex.unlock();
    }

    // Flag the waiting background threads that they should begin analysis
    g_bTestReadyToStart.store(true);
}

Solution

No, it doesn't allocate memory permanently.

See the source code which is quite transparent and shows usage only of dynamically allocated memory which is released when execution is done.

VmRSS (Virtual Memory Resident Set Size) tells how much of process memory (out of all VM memory) is now in physical memory. So, this is quite normal that if your process uses memory, it goes to physical memory and its size grows.

The VmRSS is inaccurate metric for measure used memory. See at the Linux manual page - proc_pid_status

VmRSS Resident set size. Note that the value here is the sum of RssAnon, RssFile, and RssShmem. This value is inaccurate; see /proc/pid/statm above.

You shouldn't rely on thee shrink_to_fit(), as well in the:

    // Request minimal vector allocation
    g_pWaitingTasksVector->shrink_to_fit();

since per shrink_to_fit:

It is a non-binding request to reduce capacity() to size(). It depends on the implementation whether the request is fulfilled.

And as a note, your reasoning about absence of leaks is incorrect. Leaks could be "conditional", in your case the leak could take place once per thread (when you start it, uses first time or cyclic, when you keep something leak-ready till the end of the loop, like keeping every time updated pointer to memory which never will be realeased).