free() returns memory to the OS

My test code shows that after free() and before the program exits, the heap memory is returned to the OS. I use htop(same for top) to observe the behaviour. My glibc version is ldd (Ubuntu GLIBC 2.31-0ubuntu9.9) 2.31 .

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

#define BUFSIZE 10737418240 

int main(){
    printf("start\n");
    u_int32_t* p = (u_int32_t*)malloc(BUFSIZE);
    if (p == NULL){
        printf("alloc 10GB failed\n");
        exit(1);
    }
    memset(p, 0, BUFSIZ);
    for(size_t i = 0; i < (BUFSIZE / 4); i++){
        p[i] = 10;
    }
    printf("before free\n");
    free(p);
    sleep(1000);
    printf("exit\n");
}

Why this question Why does the free() function not return memory to the operating system? observes an opposite behaviour compared to mine? The OP also uses linux and the question is asked in 2018. Do I miss something?

Solution

I did some experiments, read a chapter of The Linux Programming Interface and get an satisfying answer for myself.

First , the conclusion I have is:

Library call malloc uses system calls brk and mmap under the hood when allocating memory.
As @John Zwinck describs, a linux process would choose to use brk or mmap allocating mem depending on how much you request.
If allocating by brk, the process is probably not returning the memory to the OS before it terminates (sometimes it does). If by mmap, for my simple test the process returns the mem to OS before it terminates.

Experiment code (examine memory stats in htop at the same time):

code sample 1

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <stdint.h>

#define BUFSIZE 1073741824 //1GiB

// run `ulimit -s unlimited` first

int main(){
    printf("start\n");
    printf("%lu \n", sizeof(uint32_t));
    uint32_t* p_arr[BUFSIZE / 4]; 
    sleep(10); 
    for(size_t i = 0; i < (BUFSIZE / 4); i++){
        uint32_t* p = (uint32_t*)malloc(sizeof(uint32_t));
        if (p == NULL){
            printf("alloc failed\n");
            exit(1);
        }
        p_arr[i] = p;
    } 
    printf("alloc done\n"); 
    for(size_t i = 0; i < (BUFSIZE / 4); i++){
        free(p_arr[i]);
    }
    
    printf("free done\n");
    sleep(20);
    printf("exit\n");
}

When it comes to "free done\n", and sleep(), you can see that the program still takes up the memory and doesn't return to the OS. And strace ./a.out showing brk gets called many times.

Note:

I am looping malloc to allocate memory. I expected it to take up only 1GiB ram but in fact it takes up 8GiB ram in total. malloc adds some extra bytes for bookeeping or whatever else. One should never allocate 1GiB in this way, in a loop like this.

code sample 2:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <stdint.h>

#define BUFSIZE 1073741824 //1GiB

int main(){
    printf("start\n");
    printf("%lu \n", sizeof(uint32_t));
    uint32_t* p_arr[BUFSIZE / 4]; 
    sleep(3); 
    for(size_t i = 0; i < (BUFSIZE / 4); i++){
        uint32_t* p = (uint32_t*)malloc(sizeof(uint32_t));
        if (p == NULL){
            printf("alloc failed\n");
            exit(1);
        }
        p_arr[i] = p;
    } 
    printf("%p\n", p_arr[0]);
    printf("alloc done\n"); 
    for(size_t i = 0; i < (BUFSIZE / 4); i++){
        free(p_arr[i]);
    }
    printf("free done\n");
    printf("allocate again\n");
    sleep(10);
    for(size_t i = 0; i < (BUFSIZE / 4); i++){
        uint32_t* p = malloc(sizeof(uint32_t));
        if (p == NULL){
            PFATAL("alloc failed\n");
        }
        p_arr[i] = p;
    } 
    printf("allocate again done\n");
    sleep(10);
    for(size_t i = 0; i < (BUFSIZE / 4); i++){
        free(p_arr[i]);
    }
    printf("%p\n", p_arr[0]);
    sleep(3);
    printf("exit\n");
}

This one is similar to sample 1, but it allocate again after free. The scecond allocation doesn't increase memory usage, it uses the freed yet not returned mem again.

code sample 3:

#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>

#define MAX_ALLOCS 1000000

int main(int argc, char* argv[]){
    int freeStep, freeMin, freeMax, blockSize, numAllocs, j;
    char* ptr[MAX_ALLOCS];
    printf("\n");
    numAllocs = atoi(argv[1]);
    blockSize = atoi(argv[2]);
    freeStep = (argc > 3) ? atoi(argv[3]) : 1;
    freeMin = (argc > 4) ? atoi(argv[4]) : 1;
    freeMax = (argc > 5) ? atoi(argv[5]) : numAllocs;
    assert(freeMax <= numAllocs);

    printf("Initial program break:   %10p\n", sbrk(0));
    printf("Allocating %d*%d bytes\n", numAllocs, blockSize);
    for(j = 0; j < numAllocs; j++){
        ptr[j] = malloc(blockSize);
        if(ptr[j] == NULL){
            perror("malloc return NULL");
            exit(EXIT_FAILURE);
        }
    }

    printf("Program break is now:    %10p\n", sbrk(0));
    printf("Freeing blocks from %d to %d in steps of %d\n", freeMin, freeMax, freeStep);
    for(j = freeMin - 1; j < freeMax; j += freeStep){
        free(ptr[j]);
    }
    printf("After free(), program break is : %10p\n", sbrk(0));
    printf("\n");
    exit(EXIT_SUCCESS);
}

This one takes from The Linux Programming Interface and I simplifiy a bit.

Chapter 7:

The first two command-line arguments specify the number and size of blocks to allocate. The third command-line argument specifies the loop step unit to be used when freeing memory blocks. If we specify 1 here (which is also the default if this argument is omitted), then the program frees every memory block; if 2, then every second allocated block; and so on. The fourth and fifth command-line arguments specify the range of blocks that we wish to free. If these arguments are omitted, then all allocated blocks (in steps given by the third command-line argument) are freed.

Try run with:

./free_and_sbrk 1000 10240 2
./free_and_sbrk 1000 10240 1 1 999
./free_and_sbrk 1000 10240 1 500 1000

you will see only for the last example, the program break decreases, aka, the process returns some blocks of mem to OS (if I understand correctly).

This sample code is evidence of

"If allocating by brk, the process is probably not returning the memory to the OS before it terminates (sometimes it does)."

At last, quotes some useful paragraph from the book. I suggest reading Chapter 7 (section 7.1) of TLPI, very helpful.

In general, free() doesn’t lower the program break, but instead adds the block of memory to a list of free blocks that are recycled by future calls to malloc(). This is done for several reasons:

The block of memory being freed is typically somewhere in the middle of the heap, rather than at the end, so that lowering the program break is not possible.

It minimizes the number of sbrk() calls that the program must perform. (As noted in Section 3.1, system calls have a small but significant overhead.)

In many cases, lowering the break would not help programs that allocate large amounts of memory, since they typically tend to hold on to allocated memory or repeatedly release and reallocate memory, rather than release it all and then continue to run for an extended period of time.

What is program break (also from the book):

Also: https://www.wikiwand.com/en/Data_segment