I created a sample to study the TLB access/misses statistics. Sample writes 1
to every 4096-th element of the array. Array has 10'000 * 4096
bytes. I expect to see 10'000
TLB stores only, but generated assembly loads beginning of the array every iteration, resulting in 10'000
TLB loads in addition to stores. -O3
optimization is applied
When I looked into the assembly, I noticed that the for-loop looks like that:
Question: Why step 1 is executed every single iteration? Beginning of the array is not changing. I expect the beginning to be loaded once and the jump to be to step 2
(main just calls this test_function
10K times):
#include <stdlib.h>
#define REPEAT 10000
#define PAGESIZE 4096
#define PAGES 10000
char *data = NULL;
inline void test_function()
{
for (int i = 0; (i < PAGES * PAGESIZE); i += (PAGESIZE)) {
data[i] = 1;
}
}
int main()
{
data = (char *) malloc(PAGES * PAGESIZE);
for (int idx = 0; idx < REPEAT; idx++)
test_function();
free(data);
return 0;
}
1090: mov rdx,QWORD PTR [rip+0x2f81] # 4018 <data>
1097: mov BYTE PTR [rdx+rax*1],0x1
109b: add rax,0x1000
10a1: cmp rax,0x2710000
10a7: jne 1090 <main+0x30>
Per function call we can see:
1002500646 L1-dcache-load:u (66.50%)
998346580 L1-dcache-stores:u (66.68%)
995750153 dTLB-loads:u (66.81%)
998020884 dTLB-stores:u (66.74%)
34586 dTLB-loads-misses:u (66.68%)
1003649391 dTLB-stores-misses:u (66.58%)
Intel(R) Core(TM) i7-10610U CPU
Ubuntu 22.04.3 LTS
g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Compilation line: g++ ./tlb.cpp -O3 -g -o gcc.out
The code does not know what data
holds when the code runs. Therefore, it is conceivable that data[i]
overlays data
:
data = (char *)&data;
when this happens, assigning to data[i]
could modify data
itself. Thus the compiler cannot simply assume they do not alias and must reload data
after each access. Granted, with some more code it could fix this, but the compiler writers have not implemented that.
To avoid this problem, make sure to cache global variables in local variables if this sort of problem could occur:
inline void test_function()
{
char *d;
d = data;
for (int i = 0; (i < PAGES * PAGESIZE); i += (PAGESIZE)) {
d[i] = 1;
}
}