Search code examples
ccpu-registerstype-punning

Analysis of C code


Here is function that i am writing on 64 bit linux machine.

void myfunc(unsigned char* arr) //array of 8 bytes is passed by reference
{
   unsigned long a = 0; //8 bytes
   unsigned char* LL = (unsigned char*) &a;

   LL[0] = arr[6];
   LL[1] = arr[3];
   LL[2] = arr[1];
   LL[3] = arr[7];
   LL[4] = arr[5];
   LL[5] = arr[4];
   LL[6] = arr[0];
   LL[7] = arr[2];
}

Now my questions are:

  1. Will variable 'a' be stored in a register so that It wont be accessed again and again from RAM or chache?
  2. Working on 64 bit architecture, should I assume that 'arr' array will be stored in a register as functions parameters are stored in a register in 64 bit arch?
  3. How efficient is Pointer type casting? my guess is that It should be inefficient at all?

Any help would be appriciated.

Regards


Solution

    1. a cannot be stored in a register, as you have taken its address. (valdo correctly points out that a really smart compiler could optimize the array accesses into bit operations and leave a in a register, but I've never seen a compiler do that, and I'm not sure it would wind up being faster).
    2. arr (the pointer itself) is stored in a register (%edi, on amd64). The contents of the array are in memory.
    3. Pointer type casting by itself often generates no code at all. However, doing silly things with type casts can lead to very inefficient code, or even to code whose behavior is undefined.

    It looks like you are trying to permute the bytes in an array and then shove them into a number, and the machine code your example generates is not terribly bad for that. David's suggestion to use shift and mask operations instead is good (this will also avoid problems if your code ever needs to run on a big-endian machine), and there are also the SSE vector permute instructions, but I have heard they're kind of a pain to use.

    Incidentally, you should make the return type of your example function be unsigned long and put return a; at the very end; then you can use gcc -O2 -S and see exactly what you get from compilation. Without the change to return a, GCC will cheerfully optimize away the entire body of the function, since it has no externally visible side effects.