YouTube Video to Insertion Sort - https://www.youtube.com/watch?v=JU767SDMDvA
This is my implementation of it in C
void insertionSort(void *data, uint32_t length, int8_t compareTo(const void * const, const
void * const), const size_t bytes){
uint32_t i;
for(i = 0; i < length; i++){
uint8_t isSorted;
int32_t j;
isSorted = 0;
for(j = i - 1; j > -1 && !isSorted; j--){
isSorted = 1;
if(compareTo((int8_t *)data + j * bytes, (int8_t *)data + (j + 1) * bytes) > 0){
uint32_t byteIndex;
void *valCopy;
valCopy = malloc(bytes);
memcpy(valCopy, (int8_t *)data + j * bytes, bytes);
for(byteIndex = 0; byteIndex < bytes; byteIndex++){
*((int8_t *)data + (j * bytes + byteIndex)) = *((int8_t *)data + ((j + 1) * bytes + byteIndex));
*((int8_t *)data + ((j + 1) * bytes + byteIndex)) = *((int8_t *)valCopy + byteIndex);
}
/**
instead of the for loop you can replace it with this to make it look more clean
memcpy((int8_t *)data + j * bytes, (int8_t *)data + (j + 1) * bytes, bytes);
memcpy((int8_t *)data + (j + 1) * bytes, valCopy, bytes);
**/
free(valCopy);
isSorted = 0;
}
}
}
}
int8_t compareTo(const void * const val1, const void * const val2){
if(*(const int32_t * const)val1 > *(const int32_t * const)val2)return 1;
else if(*(const int32_t * const)val1 < *(const int32_t * const)val2)return -1;
return 0;
}
int main(void){
int32_t i;
int32_t data[] = {2, 6, 5, 3, 8, 7, 1, 0};
int32_t dataLength = sizeof(data) / sizeof(*data);
insertionSort(data, dataLength, &compareTo, sizeof(int32_t));
for(i = 0; i < dataLength; i++){
printf("%d ", data[i]);
}
return 0;
}
I was wonder if there is a more efficient way then to copy the value each time using memcpy or the for loop?
As another answer already observes, calling malloc()
and free()
for every swap is unneeded. All those extra calls indeed appear to be the largest source of inefficiency. You need at most one malloc
and one free
call, but you can get away without any for item sizes no larger than a limit you choose. For example,
#define SMALL_ITEM_LIMIT 1024 /* for example */
// ...
void insertionSort(void *data, uint32_t length,
int8_t compareTo(const void * const, const void * const), const size_t bytes) {
char auto_buffer[SMALL_ITEM_LIMIT];
char *temp;
if (bytes > SMALL_ITEM_LIMIT) {
temp = malloc(bytes);
if (temp == NULL) {
// ... handle allocation failure ...
}
} else {
temp = auto_buffer;
}
// ... main part of the sort ...
if (temp != auto_buffer) {
free(temp);
}
}
As a minor matter, the use of variable isSorted
is unnecessary and a bit clumsy. You can avoid it and probably shave a few cycles by simply break
ing from the j
loop when the current element reaches its insertion position.
You ask:
I was wonder if there is a more efficient way then to copy the value each time using memcpy or the for loop?
For a generic sort such as this, where you do not know the type of the items being sorted, there is no alternative to bulk memory operations for moving elements. I would be inclined to start with memcpy()
and / or memmove()
, as it's clearer. Do not use an inner loop for that without testing on a variety of cases to determine whether it genuinely provides any improvement.
However, you don't necessarily need to move elements one position at a time. Instead, on each outer-loop iteration, you can locate the insertion position without moving anything, then perform the insertion via a single n-element rotation. For random data, that will tend to perform fewer reads and writes. That variation might look like this (some names have been changed to make them clearer):
void insertionSort(void *data, uint32_t item_count,
int compare(const void *, const void *), size_t item_size) {
char auto_buffer[SMALL_ITEM_LIMIT];
char *temp = (item_size > SMALL_ITEM_LIMIT) ? malloc(item_size) : auto_buffer;
if (temp) {
char *char_data = data; // for clarity; avoids having to cast all the time
for (uint32_t i = 1; i < count; i++) { // no point in starting at 0
// Find the insertion position
for (uint32_t j = i; j > 0; j--) {
if (compare(char_data + j * item_size,
char_data + (j - 1) * item_size) >= 0) {
break;
}
}
// Rotate the current item into position
if (j != i) {
memcpy(temp, char_data + i * item_size, item_size);
memmove(char_data + j * item_size,
char_data + (j + 1) * item_size,
(i - j) * item_size);
memcpy(char_data + j * item_size, temp, item_size);
}
}
if (temp != auto_buffer) {
free(temp);
}
} // else memory allocation failed
}
Alternatively, it might be a bit more efficient in practice to implement the rotations more in parallel with the comparisons to take better advantage of cache and data locality. This like performing only half (or maybe one third) of a swap for each exchange. The sort loop for that would be something like this:
for (uint32_t i = 1; i < count; i++) { // no point in starting at 0
// store element i
memcpy(temp, char_data + i * item_size, item_size);
// Find the insertion position
for (uint32_t j = i; j > 0; j--) {
if (compare(char_data + j * item_size,
char_data + (j - 1) * item_size) < 0) {
// shift element j - 1 up one position
memcpy(char_data + (j - 1) * item_size,
char_data + j * item_size,
item_size);
} else {
break;
}
}
if (j != i) {
// Put the erstwhile value of element i into its position
memcpy(char_data + j * item_size, temp, item_size);
}
}
Which of those will actually perform better in practice under any particular circumstances is a question to answer by testing.