Search code examples
cutility-method

Don't fully understand custom-written 'memcpy' function in C


So I was browsing the Quake engine source code earlier today and stumbled upon some written utility functions. One of them was 'Q_memcpy':

void Q_memcpy (void *dest, void *src, int count)
{
    int             i;

    if (( ( (long)dest | (long)src | count) & 3) == 0 )
    {
        count>>=2;
        for (i=0 ; i<count ; i++)
            ((int *)dest)[i] = ((int *)src)[i];
    }
    else
        for (i=0 ; i<count ; i++)
            ((byte *)dest)[i] = ((byte *)src)[i];
}

I understand the whole premise of the function but I don't quite understand the reason for the bitwise OR between the source and destination address. So the sum of my questions are as follows:

  • Why does 'count' get used in the same bitwise arithmetic?
  • Why is that result's last two bits checked if they are differing?
  • What purpose does this whole check serve?

I'm sure it's something obvious but please excuse my ignorance because I haven't really delved into the more low level side of things when it comes to programming. I just find it interesting and want to learn more.


Solution

  • It first tests if all 3 arguments are divisible by 4. If - and only if - they all are, it proceeds with copying 4 bytes at a time.

    I.e. this undecoded would be

    if ((long) src % 4 == 0 && (long) dst % 4 == 0 && count % 4 == 0 )
    {
        count = count / 4;
        for (i = 0; i < count; i++)
            ((int *)dest)[i] = ((int *)src)[i];
    }
    

    I am not sure if they tested their compiler and it generated bad code for even a test, and therefore they decided to write it in such a convoluted way. In any case, the x | y | z will guarantee that a bit n is set in the result if it is set in any of x, y or z. Therefore if the (x | y | z) & 3 results in 0, none of the numbers had either of the 2 lowest bits set, and therefore are divisible by 4.


    Of course it would be rather silly to use now - the standard library memcpy in recent library implementations is almost certainly better than this.

    Therefore, on recent compilers you can optimize all calls to Q_memcpy by switching them to memcpy. GCC could generate things like 64-bit or SIMD moves with memcpy depending on the size of area to be copied.