Search code examples
algorithmsortingcounting-sort

Counting sort - Efficiency


I was thinking about counting sort and how we implement it, actually how the algorithm works. I am stuck with one part, algorithm is really straightforward and easy to understand but one part of it doesn't seem necessary. I thought people might mistaken or so, but it seems like everyone using the same method so I am mistaken somewhere. Can you please explain.

Here is code for counting sort from geeksforgeeks

    // C Program for counting sort
#include <stdio.h>
#include <string.h>
#define RANGE 255

// The main function that sort the given string arr[] in
// alphabatical order
void countSort(char arr[])
{
    // The output character array that will have sorted arr
    char output[strlen(arr)];

    // Create a count array to store count of inidividul
    // characters and initialize count array as 0
    int count[RANGE + 1], i;
    memset(count, 0, sizeof(count));

    // Store count of each character
    for(i = 0; arr[i]; ++i)
        ++count[arr[i]];

    // Change count[i] so that count[i] now contains actual
    // position of this character in output array
    for (i = 1; i <= RANGE; ++i)
        count[i] += count[i-1];

    // Build the output character array
    for (i = 0; arr[i]; ++i)
    {
        output[count[arr[i]]-1] = arr[i];
        --count[arr[i]];
    }

    // Copy the output array to arr, so that arr now
    // contains sorted characters
    for (i = 0; arr[i]; ++i)
        arr[i] = output[i];
}

// Driver program to test above function
int main()
{
    char arr[] = "geeksforgeeks";//"applepp";

    countSort(arr);

    printf("Sorted character array is %s\n", arr);
    return 0;
}

Cool , but about this part:

// Build the output character array
        for (i = 0; arr[i]; ++i)
        {
            output[count[arr[i]]-1] = arr[i];
            --count[arr[i]];
        }

Why do I need this ?? Ok I counted my numbers :

Let's say I had array -> [1, 3, 6, 3, 2, 4]

         INDEXES     0  1  2  3  4  5  6
  I created this -> [0, 1, 1, 2, 1, 0, 1]

Than this part does this:

  [0, 1+0, 1+1, 2+2, 4+1, 0+5, 1+5]
  [0, 1, 2, 4, 5, 5, 6]

BUT WHY ??

Can't I just use my array like the one before ? Here is my idea and my code, please explain why it's wrong or, why other way is more useful.

void countingSort (int *arr) {

    int countingArray[MAX_NUM] = {0};

    for (i = 0 ; i < ARRAY_SIZE ; i++)
        countingArray[arr[i]]++;

    int output_Index = 0;

    for (i = 0 ; i < MAX_NUM ; i++)
        while ( countingArray[i]-- )
            arr[output_Index++] = i;
}

Solution

  • For the simple case where you are sorting an array of integers, your code is simpler and better.

    However, counting sort is a general sorting algorithm that can sort based on a sorting key derived from the items to be sorted, which is used to compare them, as opposed to directly comparing the items themselves. In the case of an array of integers, the items and the sort keys can be one and the same, you just compare them directly.

    It looks to me as though the geeksforgeeks code has been adapted from a more generic example that allows the use of sorting keys, something like this:

    // Store count of each item
    for(i = 0; arr[i]; ++i)
        ++count[key(arr[i])];
    
    // Change count[i] so that count[i] now contains actual
    // position of this character in output array
    for (i = 1; i <= RANGE; ++i)
        count[i] += count[i-1];
    
    // Build the output array
    for (i = 0; arr[i]; ++i)
    {
        output[count[key(arr[i])]-1] = arr[i];
        --count[key(arr[i])];
    }
    

    Where key is a function that computes a sort key based on an item (for an integer type you could just return the integer itself). In this case MAX_NUM would have to be replaced with MAX_KEY.

    This approach uses the extra output array because the final result is generated by copying the items from arr rather than simply from the information in count (which only contains the count of items with each key). However, an in-place counting sort is possible.

    The algorithm also guarantees a stable sort (items with the same sort key have their relative order preserved by sorting) - this is meaningless when sorting integers.

    However, since they have removed the ability to sort based on key, there's no reason for the extra complexity and your way is better.

    It's also possible that they have copied the code from a language like C++, where the int cast (which will be called when using an item to index an array) could be overloaded to return the sort key, but have mistakenly converted to C.