c++multithreadingheap-memory

C++ Memory De-allocation in Large Lists with Threads on Fedora 35


I am experiencing an issue related to memory management in C++ on a Fedora 35 system with 8GB of memory. Specifically, I am working with two threads where each thread is supposed to allocate approximately 4GB of memory, use it, and then deallocate it. The threads are controlled with a std::mutex to ensure that only one thread is allocating or deallocating memory at a time.

Here is the code I am working with:

#include <thread>
#include <list>
#include <string>
#include <mutex>

std::mutex mtx;

// Function for first thread
void manageList1() {
    mtx.lock();
    {
        std::list<std::string> myList1;

        for(int i = 0; i < 50000; ++i) {
            myList1.push_back(std::string(80000, 'a')); // Approx. 4GB
        }

        myList1.clear(); // Clear list
    }
    mtx.unlock();

    while(true){} // Keep thread 1 from exiting
}

// Function for second thread
void manageList2() {
    mtx.lock();
    {
        std::list<std::string> myList2;
        for(int i = 0; i < 50000; ++i) {
            myList2.push_back(std::string(80000, 'a')); // Approx. 4GB
        }

        myList2.clear(); // Clear list
    }
    mtx.unlock();
}

int main() {
    std::thread listThread1(manageList1);
    std::thread listThread2(manageList2);

    // Don't join listThread1
    listThread2.join();

    return 0;
}

I expected the total memory usage of this program to not exceed approximately 4GB (plus the overhead for the program and threads) since one thread should free its memory before the other begins allocating. However, what I observe is that the memory usage gradually increases until the system runs out of memory and kills the process.

I am aware that in C++, memory that is freed is not necessarily immediately returned to the operating system, and it might be kept around for future allocations by the same program. However, in this case, it seems like the memory from the first thread isn't being reused by the second thread, which leads to excessive memory usage.

I would appreciate any insights into this problem. Why is the memory not being freed as expected? Is there a way to ensure the memory is freed immediately after it's no longer needed?


Solution

  • Based on discussion in the comments, this is my best guess:

    You are using glibc's malloc implementation which is where memory allocation requests for your strings ultimately end up. (glibc is the most common provider of the C standard library implementation on Linux distributions, but there are others like musl)

    The behavior you are seeing is a side effect of how this particular malloc is implemented. In particular, it doesn't consider 80000 bytes large enough to allocate and deallocate memory for each string completely separately using mmap. The default limit for that is 128k and can be set in code (using mallopt which is non-portable) as well as with the environment variable MALLOC_MMAP_THRESHOLD_.

    So, the usual chunk-based arena allocator that is used for all smaller allocations is used. By default malloc will use multiple arenas up to some upper limit and try to assign different threads different arenas so their allocations don't interfere.

    Also, the malloc implementation delays releasing back the free memory at the top of a heap used by an arena after a free until a later point, e.g. a call to malloc, in order to avoid releasing and then immediately reacquiring memory and maybe also to keep free as fast as possible.

    So, it seems likely that after the closing } in your thread, the list has been completely deallocated by calls to glibc's free, but free decided to not yet release the memory back to the system.

    Because the other thread then operates in its own arena using its own heap, it will also not release the first thread's memory back to the operating system and you end up with needed double the amount of memory that you expected.

    You can force the malloc implementation to release all memory back to the operating system after the } with a call to malloc_trim(0). Of course that is non-portable and will only work on systems using glibc or a compatible alternative.


    I don't know much about glibc's malloc implementation, so there might be wrong points in the explanation above. An overview of the implementation can be found at https://sourceware.org/glibc/wiki/MallocInternals.