I have a program compiled with gcc 11.2, which allocates some RAM memory first (8 GB) on heap (using new), and later fills it with data read out in real-time from an oscilloscope.
uint32_t* buffer = new uint32_t[0x80000000];
for(uint64_t i = 0; i < 0x80000000; ++i) buffer[i] = GetValueFromOscilloscope();
The problem I am facing is that the optimizer skips the allocation on the first line, and does it on the fly as I'm traversing the loop. This slows down the time spent on each iteration of the loop. Because it is important to be as efficient as possible during the loop, I have found a way to force the compiler to allocate the memory before entering the for loop, namely to set all the reserved values to zero:
uint32_t* buffer = new uint32_t[0x80000000]();
My question is: ¿is there a less intrusive way of achieving the same effect without forcing the data to be zero on the first place (apart from switching off the optimization flags)? I just want to force the compiler to reserve the memory at moment of declaration, but I do not care if the reserved values are zero or not.
Thanks in advance!
EDIT1: The evidence I see for knowing that the optimizer delaying the allocation is that the 'gnome-system-monitor' shows a slowly growing RAM memory as I traverse the loop, and only after I finish the loop, it reaches 8 GiB. Whereas if I initialize all the values to zero, it the gnome-system-monitor shows a quick growth up to 8 GiB, and then it starts the loop.
EDIT2: I am using Ubuntu 22.04.1 LTS
You seem to be misinterpreting the situation. Virtual memory within a user-space process (heap space in this case) does get allocated “immediately” (possibly after a few system calls that negotiate a larger heap).
However, each page-aligned page-sized chunk of virtual memory that you haven’t touched yet will initially lack a physical page backing. Virtual pages are mapped to physical pages lazily, (only) when the need arises.
That said, the “allocation” you are observing (as part of the first access to the big heap space) is happening a few layers of abstraction below what GCC can directly influence and is handled by your operating system’s paging mechanism.
Side note: Another consequence would be, for example, that allocating a 1 TB chunk of virtual memory on a machine with, say, 128 GB of RAM will appear to work perfectly fine, as long as you never access most of that huge (lazily) allocated space. (There are configuration options that can limit such memory overcommitment if need be.)
When you touch your newly allocated virtual memory pages for the first time, each of them causes a page fault and your CPU ends up in a handler in the kernel because of that. The kernel evaluates the situation and establishes that the access was in fact legit. So it “materializes” the virtual memory page, i.e. picks a physical page to back the virtual page and updates both its bookkeeping data structures and (equally importantly) the hardware page mapping mechanism(s) (e.g. page tables or TLB, depending on architecture). Then the kernel switches back to your userspace process, which will have no clue that all of this just happened. Repeat for each page.
Presumably, the description above is hugely oversimplified. (For example, there can be multiple page sizes to strike a balance between mapping maintenance efficiency and granularity / fragmentation etc.)
A simple and ugly way to ensure that the memory buffer gets its hardware backing would be to find the smallest possible page size on your architecture (which would be 4 kiB on a x86_64, for example, so 1024 of those integers (well, in most cases)) and then touch each (possible) page of that memory beforehand, as in: for (size_t i = 0; i < 0x80000000; i += 1024) buffer[i] = 1;
.
There are (of course) more reasonable solutions than that↑; this is just an example to illustrate what’s happening and why.