Search code examples
ccpu-cache

char x[2048] and cache line issue


The following is the simple c source code, where char x[2048] is a global var and func1 is called by thread1, func2 is called by thread2:

char x[2048]={0} , y[16]={0};

void func1(){
    strcpy(x,y);
}

void func2(){
    printf("(%s)\n",x);
} 

int main(int argc, char **argv){
    strncpy(y,argv[1],sizeof(y)-1);
} 

In Intel's cpu, one cache line has 64 bytes in it, so x should occupy 32 cache lines, my questions are:

  1. while thread1 calls func1, should all 32 cache lines available to that CPU cache until then do strcpy? (or) compiler knows just one cache line is enough to do the job?

  2. While thread2 call func2, should all 32 cache lines available to that CPU cache until then do printf? (or) compiler can identify one cache line is enough?


Solution

  • I suggest you read the Wikipedia page: https://en.wikipedia.org/wiki/CPU_cache

    Some background:

    1. Normally, cache line ($L) are transparent to programs. So most programers don't deal with cache line (bring it in, kick it out) directly. The CPU, once find that code/data is not in $L, would stall for such memory access and bring in $L on demand.
    2. Although there are coding techniques to bring in data into cache line in code (e.g. via prefetch instruction), normally compiler won't be smart enough to do this for you as it might prefetch too early (so by the time $L is used, it has already been kicked out), or too late (CPU still has to stall for memory access).

    Answer to you question:

    1. No. Compiler doesn't know how many $Ls needs to be brought in (how could it know whether a piece of data is already in $L or not, so just be safe side and not outsmart itself). Compiler just issue, for example, MOV instruction, and CPU, while executing this instruction, found that operand is not in $, so would bring them in on demand. As you program only copies till '\0', so is the $L bringing in stops there.
    2. The same as #1. Only $Ls that are read would be brought in and compiler has nothing to do with this.

    More Info:

    1. CPU prefetcher might bring in additional $Ls besides those currently needed. For example, it might bring in next $L with hoping for data locality.
    2. Some advanced program use prefetch instructions to improve program performance. Suppose you know that your code would access some location in the near future, you can prefetch it, and by the time you need it, it is there already so won't incur $L miss penalty. But it's hard to get it right (you have to know the memory access pattern of your code and insert the prefetch instruction at the right place. Some high performance code designs software pipeline to do this, but again it's an advanced topic).

    https://en.wikipedia.org/wiki/Instruction_prefetch