The following code declare a global array (256 MiB) and calculate sum of it's items.
This program consumes 188 KiB when runing:
#include <cstdlib>
#include <iostream>
using namespace std;
const unsigned int buffer_len = 256 * 1024 * 1024; // 256 MB
unsigned char buffer[buffer_len];
int main()
{
while(true)
{
int sum = 0;
for(int i = 0; i < buffer_len; i++)
sum += buffer[i];
cout << "Sum: " << sum << endl;
}
return 0;
}
The following code is like the above code, but sets array elements to random value before calculating the sum of array items.
This program consumes 256.1 MiB when runing:
#include <cstdlib>
#include <iostream>
using namespace std;
const unsigned int buffer_len = 256 * 1024 * 1024; // 256 MB
unsigned char buffer[buffer_len];
int main()
{
while(true)
{
// ********** Changing array items.
for(int i = 0; i < buffer_len; i++)
buffer[i] = std::rand() % 256;
// **********
int sum = 0;
for(int i = 0; i < buffer_len; i++)
sum += buffer[i];
cout << "Sum: " << sum << endl;
}
return 0;
}
Why global arrays does not consume memory in readonly mode (188K vs 256M)?
Update:
In my real scenario I will generate the buffer with xxd
command, so it's elements are not zero:
$ xxd -i buffer.dat buffer.cpp
There's much speculation in the comments that this behavior is explained by compiler optimizations, but OP's use of g++ (i.e. without optimization) doesn't support this, and neither does the assembly output on the same architecture, which clearly shows buffer
being used:
buffer:
.zero 268435456
.section .rodata
...
.L3:
movl -4(%rbp), %eax
cmpl $268435455, %eax
ja .L2
movl -4(%rbp), %eax
cltq
leaq buffer(%rip), %rdx
movzbl (%rax,%rdx), %eax
movzbl %al, %eax
addl %eax, -8(%rbp)
addl $1, -4(%rbp)
jmp .L3
The real reason you're seeing this behavior is the use of Copy On Write in the kernel's VM system. Essentially for a large buffer of zeros like you have here, the kernel will create a single "zero page" and point all pages in buffer
to this page. Only when the page is written will it get allocated.
This is actually true in your second example as well (i.e. the behavior is the same), but you're touching every page with data, which forces the memory to be "paged in" for the entire buffer. Try only writing buffer_len/2
, and you'll see that 128.2MiB of memory gets allocated by the process. Half of the true size of the buffer:
Here's also a helpful summary from Wikipedia:
The copy-on-write technique can be extended to support efficient memory allocation by having a page of physical memory filled with zeros. When the memory is allocated, all the pages returned refer to the page of zeros and are all marked copy-on-write. This way, physical memory is not allocated for the process until data is written, allowing processes to reserve more virtual memory than physical memory and use memory sparsely, at the risk of running out of virtual address space.