Search code examples
c++memoryglobal-variables

Why global arrays does not consume memory in readonly mode?


The following code declare a global array (256 MiB) and calculate sum of it's items.

This program consumes 188 KiB when runing:

#include <cstdlib>
#include <iostream>

using namespace std;

const unsigned int buffer_len = 256 * 1024 * 1024; // 256 MB
unsigned char buffer[buffer_len];

int main()
{
    while(true)
    {
        int sum = 0;
        for(int i = 0; i < buffer_len; i++)
            sum += buffer[i];

        cout << "Sum: " << sum << endl;
    }
    return 0;
}

The following code is like the above code, but sets array elements to random value before calculating the sum of array items.

This program consumes 256.1 MiB when runing:

#include <cstdlib>
#include <iostream>

using namespace std;

const unsigned int buffer_len = 256 * 1024 * 1024; // 256 MB
unsigned char buffer[buffer_len];

int main()
{
    while(true)
    {
        // ********** Changing array items.
        for(int i = 0; i < buffer_len; i++)
            buffer[i] = std::rand() % 256;
        // **********

        int sum = 0;
        for(int i = 0; i < buffer_len; i++)
            sum += buffer[i];

        cout << "Sum: " << sum << endl;
    }
    return 0;
}

Why global arrays does not consume memory in readonly mode (188K vs 256M)?

  • My Compiler: GCC
  • MY OS: Ubuntu 20.04

Update:

In my real scenario I will generate the buffer with xxd command, so it's elements are not zero:

$ xxd -i buffer.dat buffer.cpp


Solution

  • There's much speculation in the comments that this behavior is explained by compiler optimizations, but OP's use of g++ (i.e. without optimization) doesn't support this, and neither does the assembly output on the same architecture, which clearly shows buffer being used:

    buffer:
            .zero   268435456
            .section        .rodata
    

    ...

    .L3:
            movl    -4(%rbp), %eax
            cmpl    $268435455, %eax
            ja      .L2
            movl    -4(%rbp), %eax
            cltq
            leaq    buffer(%rip), %rdx
            movzbl  (%rax,%rdx), %eax
            movzbl  %al, %eax
            addl    %eax, -8(%rbp)
            addl    $1, -4(%rbp)
            jmp     .L3
    

    The real reason you're seeing this behavior is the use of Copy On Write in the kernel's VM system. Essentially for a large buffer of zeros like you have here, the kernel will create a single "zero page" and point all pages in buffer to this page. Only when the page is written will it get allocated.

    This is actually true in your second example as well (i.e. the behavior is the same), but you're touching every page with data, which forces the memory to be "paged in" for the entire buffer. Try only writing buffer_len/2, and you'll see that 128.2MiB of memory gets allocated by the process. Half of the true size of the buffer:

    half initialized

    Here's also a helpful summary from Wikipedia:

    The copy-on-write technique can be extended to support efficient memory allocation by having a page of physical memory filled with zeros. When the memory is allocated, all the pages returned refer to the page of zeros and are all marked copy-on-write. This way, physical memory is not allocated for the process until data is written, allowing processes to reserve more virtual memory than physical memory and use memory sparsely, at the risk of running out of virtual address space.