c algorithm sorting memcpy insertion-sort

Is there a better way to do Insertion Sort?

YouTube Video to Insertion Sort - https://www.youtube.com/watch?v=JU767SDMDvA

This is my implementation of it in C

void insertionSort(void *data, uint32_t length, int8_t compareTo(const void * const, const 
  void * const), const size_t bytes){
  uint32_t i;
  for(i = 0; i < length; i++){
    uint8_t isSorted;
    int32_t j;
    isSorted = 0;
    for(j = i - 1; j > -1 && !isSorted; j--){
        isSorted = 1;
        if(compareTo((int8_t *)data + j * bytes, (int8_t *)data + (j + 1) * bytes) > 0){
            uint32_t byteIndex;
            void *valCopy;
            valCopy = malloc(bytes);
            memcpy(valCopy, (int8_t *)data + j * bytes, bytes);

            for(byteIndex = 0; byteIndex < bytes; byteIndex++){
                *((int8_t *)data + (j * bytes + byteIndex)) = *((int8_t *)data + ((j + 1) * bytes + byteIndex));
                *((int8_t *)data + ((j + 1) * bytes + byteIndex)) = *((int8_t *)valCopy + byteIndex);
            }

            /**
            instead of the for loop you can replace it with this to make it look more clean
            memcpy((int8_t *)data + j * bytes, (int8_t *)data + (j + 1) * bytes, bytes);
            memcpy((int8_t *)data + (j + 1) * bytes, valCopy, bytes);
            **/

            free(valCopy);
            isSorted = 0;
      }
    }
  }
}


int8_t compareTo(const void * const val1, const void * const val2){
   if(*(const int32_t * const)val1 > *(const int32_t * const)val2)return 1;
   else if(*(const int32_t * const)val1 < *(const int32_t * const)val2)return -1;
   return 0;
}

int main(void){
   int32_t i;
   int32_t data[] = {2, 6, 5, 3, 8, 7, 1, 0};
   int32_t dataLength = sizeof(data) / sizeof(*data);

   insertionSort(data, dataLength, &compareTo, sizeof(int32_t));

   for(i = 0; i < dataLength; i++){
       printf("%d ", data[i]);
   }

   return 0;
}

I was wonder if there is a more efficient way then to copy the value each time using memcpy or the for loop?

Solution

As another answer already observes, calling malloc() and free() for every swap is unneeded. All those extra calls indeed appear to be the largest source of inefficiency. You need at most one malloc and one free call, but you can get away without any for item sizes no larger than a limit you choose. For example,

#define SMALL_ITEM_LIMIT 1024 /* for example */

// ...

void insertionSort(void *data, uint32_t length,
    int8_t compareTo(const void * const, const void * const), const size_t bytes) {
    char auto_buffer[SMALL_ITEM_LIMIT];
    char *temp;

    if (bytes > SMALL_ITEM_LIMIT) {
        temp = malloc(bytes);
        if (temp == NULL) {
            // ... handle allocation failure ...
        }
    } else {
        temp = auto_buffer;
    }

    // ... main part of the sort ...

    if (temp != auto_buffer) {
        free(temp);
    }
}

As a minor matter, the use of variable isSorted is unnecessary and a bit clumsy. You can avoid it and probably shave a few cycles by simply breaking from the j loop when the current element reaches its insertion position.

You ask:

I was wonder if there is a more efficient way then to copy the value each time using memcpy or the for loop?

For a generic sort such as this, where you do not know the type of the items being sorted, there is no alternative to bulk memory operations for moving elements. I would be inclined to start with memcpy() and / or memmove(), as it's clearer. Do not use an inner loop for that without testing on a variety of cases to determine whether it genuinely provides any improvement.

However, you don't necessarily need to move elements one position at a time. Instead, on each outer-loop iteration, you can locate the insertion position without moving anything, then perform the insertion via a single n-element rotation. For random data, that will tend to perform fewer reads and writes. That variation might look like this (some names have been changed to make them clearer):

void insertionSort(void *data, uint32_t item_count,
        int compare(const void *, const void *), size_t item_size) {
    char auto_buffer[SMALL_ITEM_LIMIT];
    char *temp = (item_size > SMALL_ITEM_LIMIT) ? malloc(item_size) : auto_buffer;

    if (temp) {
        char *char_data = data;  // for clarity; avoids having to cast all the time

        for (uint32_t i = 1; i < count; i++) { // no point in starting at 0
            // Find the insertion position
            for (uint32_t j = i; j > 0; j--) {
                if (compare(char_data +  j      * item_size,
                            char_data + (j - 1) * item_size) >= 0) {
                    break;
                }
            }
            // Rotate the current item into position
            if (j != i) {
                memcpy(temp, char_data + i * item_size, item_size);
                memmove(char_data +  j      * item_size,
                        char_data + (j + 1) * item_size,
                        (i - j) * item_size);
                memcpy(char_data + j * item_size, temp, item_size);
            }
        }

        if (temp != auto_buffer) {
            free(temp);
        }
    } // else memory allocation failed
}

Alternatively, it might be a bit more efficient in practice to implement the rotations more in parallel with the comparisons to take better advantage of cache and data locality. This like performing only half (or maybe one third) of a swap for each exchange. The sort loop for that would be something like this:

        for (uint32_t i = 1; i < count; i++) { // no point in starting at 0
            // store element i
            memcpy(temp, char_data + i * item_size, item_size);

            // Find the insertion position
            for (uint32_t j = i; j > 0; j--) {
                if (compare(char_data +  j      * item_size,
                            char_data + (j - 1) * item_size) < 0) {
                    // shift element j - 1 up one position
                    memcpy(char_data + (j - 1) * item_size,
                           char_data +  j      * item_size,
                           item_size);
                } else {
                    break;
                }
            }

            if (j != i) {
                // Put the erstwhile value of element i into its position
                memcpy(char_data + j * item_size, temp, item_size);
            }
        }

Which of those will actually perform better in practice under any particular circumstances is a question to answer by testing.