Search code examples
cmallocrealloc

Why is realloc eating tons of memory?


This question is a bit long due the source code, which I tried to simplify as much as possible. Please bear with me and thanks for reading along.

I have an application with a loop that runs potentially millions of times. Instead of several thousands to millions of malloc/free calls within that loop, I would like to do one malloc up front and then several thousands to millions of realloc calls.

But I'm running into a problem where my application consumes several GB of memory and kills itself, when I am using realloc. If I use malloc, my memory usage is fine.

If I run on smaller test data sets with valgrind's memtest, it reports no memory leaks with either malloc or realloc.

I have verified that I am matching every malloc-ed (and then realloc-ed) object with a corresponding free.

So, in theory, I am not leaking memory, it is just that using realloc seems to consume all of my available RAM, and I'd like to know why and what I can do to fix this.

What I have initially is something like this, which uses malloc and works properly:

Malloc code

void A () {
    do {
        B();
    } while (someConditionThatIsTrueForMillionInstances);
}

void B () {
    char *firstString = NULL;
    char *secondString = NULL;
    char *someOtherString;

    /* populate someOtherString with data from stream, for example */

    C((const char *)someOtherString, &firstString, &secondString);

    fprintf(stderr, "first: [%s] | second: [%s]\n", firstString, secondString);

    if (firstString)
        free(firstString);
    if (secondString)
        free(secondString);
}

void C (const char *someOtherString, char **firstString, char **secondString) {
    char firstBuffer[BUFLENGTH];
    char secondBuffer[BUFLENGTH];

    /* populate buffers with some data from tokenizing someOtherString in a special way */

    *firstString = malloc(strlen(firstBuffer)+1);
    strncpy(*firstString, firstBuffer, strlen(firstBuffer)+1);

    *secondString = malloc(strlen(secondBuffer)+1);
    strncpy(*secondString, secondBuffer, strlen(secondBuffer)+1);
}

This works fine. But I want something faster.

Now I test a realloc arrangement, which malloc-s only once:

Realloc code

void A () {
    char *firstString = NULL;
    char *secondString = NULL;

    do {
        B(&firstString, &secondString);
    } while (someConditionThatIsTrueForMillionInstances);

    if (firstString)
        free(firstString);
    if (secondString)
        free(secondString);
}

void B (char **firstString, char **secondString) {
    char *someOtherString;

    /* populate someOtherString with data from stream, for example */

    C((const char *)someOtherString, &(*firstString), &(*secondString));

    fprintf(stderr, "first: [%s] | second: [%s]\n", *firstString, *secondString);
}

void C (const char *someOtherString, char **firstString, char **secondString) {
    char firstBuffer[BUFLENGTH];
    char secondBuffer[BUFLENGTH];

    /* populate buffers with some data from tokenizing someOtherString in a special way */

    /* realloc should act as malloc on first pass through */

    *firstString = realloc(*firstString, strlen(firstBuffer)+1);
    strncpy(*firstString, firstBuffer, strlen(firstBuffer)+1);

    *secondString = realloc(*secondString, strlen(secondBuffer)+1);
    strncpy(*secondString, secondBuffer, strlen(secondBuffer)+1);
}

If I look at the output of free -m on the command-line while I run this realloc-based test with a large data set that causes the million-loop condition, my memory goes from 4 GB down to 0 and the app crashes.

What am I missing about using realloc that is causing this? Sorry if this is a dumb question, and thanks in advance for your advice.


Solution

  • realloc has to copy the contents from the old buffer to the new buffer if the resizing operation cannot be done in place. A malloc/free pair can be better than a realloc if you don't need to keep around the original memory.

    That's why realloc can temporarily require more memory than a malloc/free pair. You are also encouraging fragmentation by continuously interleaving reallocs. I.e., you are basically doing:

    malloc(A);
    malloc(B);
    
    while (...)
    {
        malloc(A_temp);
        free(A);
        A= A_temp;
        malloc(B_temp);
        free(B);
        B= B_temp;
    }
    

    Whereas the original code does:

    while (...)
    {
        malloc(A);
        malloc(B);
        free(A);
        free(B);
    }
    

    At the end of each of the second loop you have cleaned up all the memory you used; that's more likely to return the global memory heap to a clean state than by interleaving memory allocations without completely freeing all of them.