Search code examples
arraysccastingbytememset

Implementation of memset to set a whole word instead of byte by byte in C


So I'm trying to implement my personal MemSet that will do the same as memset but also:

  • Copy word size chunks when possible, instead of byte by byte.

  • Guarantee dest's alignment

  • Test for all alignment possibilities

So this is my code:

void *MemSet(void *dest, int c, size_t n)
{
    unsigned char *runner = (unsigned char *)dest;
    
    size_t i = 0;
    
    unsigned char swap_word[sizeof(size_t)];
    
    for (i = 0; i < sizeof(size_t); ++i)
    {
        swap_word[i] = (unsigned char)c;
    }
    
    if (NULL == dest)
    {
        return (NULL);
    }
    
    while (n > 0)
    {
        /* setting byte by byte */
        if (n < sizeof(size_t) || (((size_t)runner & (sizeof(size_t) - 1)) != 0))
        {
            *runner++ = (unsigned char)c;
            --n;
            printf("Byte written\n"); /* for debugging */
        }
        else
        {
            /* setting a whole word */
            *((void **)runner) = *((void **)swap_word);
            runner += sizeof(size_t);
            n -= sizeof(size_t);
            printf("Word written\n"); /* for debugging */
        }
    }
    return (dest);
}

What am I doing here?

  • creating an unsigned char pointer runner to run over the dest but without changing it address so I'll be able to return it as a return value.

  • creating a ready swap_word array, with the size of sizeof(size_t) because this size determines whether my machine is 32 or 64 bit (and therefore the WORD size is 4 or 8 bytes. This array will swapped when I'll need to set a word.

  • running a simple while loop that will check if there are more than sizeof(size_t) bytes left to set , if not, it means for sure we won't be able to set a whole word and then set them byte by byte.

  • Another option to set the bytes byte by byte is if the address isn't divides by 4 or 8 (again, depends on the machine), which means I won't be able to set up a word without crossing WORD Boundary, so just set them byte by byte until I'll reach an aligned address.

  • Only option to set up a whole word is only if the data is already aligned to the WORD size of the machine, and then just set up 8 bytes (just set them up using the array of swap_word we made earlier, and advance 8 more addresses. I'll do it by using casting of

    *((void **)runner) = *((void **)swap_word);

and this is my test file:

int array[] = { 2, 3 };
    
int main () 
{
    for (i = 0; i < 2; i++)
    {
        printf("Before MemSet, target is \"%d\"\n\n", array[i]);
    }
    if (NULL == MemSet(array, 3, 2 * sizeof(int)))
    {
        fprintf(stderr,"MemSet failed!\n");
        
    }
    for (i = 0; i < 2; i++)
    {
        printf("After MemSet, target is \"%d\"\n\n", array[i]);
    }
    return (0);
}

Output is:

Before Memset, target is "2"

Before Memset, target is "3"

Word written
After Memset, target is "50529027"

After Memset, target is "50529027"

Why aren't the elements are '3'? both of them? I'm using here

MemSet(array, 3, 2 * sizeof(int))

Which, by theory, needs to set up both of the elements as 3 because the array uses 2*sizeof(int) spaces in the memory, and I set up all of them as 3.

What do you think? And also, how can I check if my alignment works?

Thanks.


Solution

  • Your function has multiple problems:

    • you test for word size move at each iteration, which is likely slower than the simple byte operation.

    • *((void * *)runner) = *((void **)swap_word); is incorrect because it violates the aliasing rule and because swap_word might not be correctly aligned for the void * type.

    You should run separate loops:

    • the first one to align the destination pointer
    • the second one to set full words, possibly more than one at a time
    • the last one to set the trailing bytes if any

    Here is an example:

    #include <limits.h>
    #include <stdio.h>
    #include <stdint.h>
    
    // assuming uintptr_t has no padding bits
    void *MemSet(void *dest, int c, size_t n) {
        if (dest != NULL) {
            unsigned char *p = dest;
            if (n >= sizeof(uintptr_t)) {
                // align destination pointer
                // this test is not fully defined but works on all classic targets
                while ((uintptr_t)p & (sizeof(uintptr_t) - 1)) {
                    *p++ = (unsigned char)c;
                    n--;
                }
                // compute word value (generalized chux formula)
                uintptr_t w = UINTPTR_MAX / UCHAR_MAX * (unsigned char)c;
                // added a redundant (void *) cast to prevent compiler warning
                uintptr_t *pw = (uintptr_t *)(void *)p;
                // set 16 or 32 bytes at a time
                while (n >= 4 * sizeof(uintptr_t)) {
                    pw[0] = w;
                    pw[1] = w;
                    pw[2] = w;
                    pw[3] = w;
                    pw += 4;
                    n -= 4 * sizeof(uintptr_t);
                }
                // set the remaining 0 to 3 words
                while (n >= sizeof(uintptr_t)) {
                    *pw++ = w;
                    n -= sizeof(uintptr_t);
                }
                p = (unsigned char *)pw;
            }
            // set the trailing bytes
            while (n --> 0) {
                *p++ = (unsigned char)c;
            }
        }
        return dest;
    }
    

    Note however that the above code is unlikely to beat memset() because:

    • the compiler may expand the above logic inline for constant sizes, skipping the alignment tests if the destination pointer is known to be aligned or if the CPU allows unaligned access.
    • the library may use specialized instructions such as SIMD or REP/STOS to increase throughput depending on the actual target CPU.

    The reason for the surprising results is int spans 4 bytes, each of which gets set to 3, so the resulting value for the integer is 0x03030303, which is exactly 50529027.