Search code examples
arrayssortingtime-complexityspace-complexitycounting-sort

What kind of drawbacks are there performance-wise , if I sort an array by using hashing?


I wrote a simple function to sort an array int a[]; using hash. For that I stored frequency for every element in new array hash1[] and then I put back in original array in linear time.

#include<bits/stdc++.h>
using namespace std;
int hash1[10000];
void sah(int a[],int n)
{
    int maxo=-1;
    for(int i=0;i<n;i++)
    {
        hash1[a[i]]++;
        if(maxo<a[i]){maxo=a[i];}
    }
    int i=0,freq=0,idx=0;
    while(i<maxo+1)
    {
        freq=hash1[i];
        if(freq>0)
        {
            while(freq>0)
            {
                a[idx++]=i;freq--;
            }
        }
        i++;
    }
}
int main()
{
    int a[]={6,8,9,22,33,59,12,5,99,12,57,7};
    int n=sizeof(a)/sizeof(a[0]);
    sah(a,n);
    for(int i=0;i<n;i++)
    {
        printf("%d ",a[i]);
    }
}

This algorithm runs in O(max_element). What kind of disadvantages I'm facing here considering only performance( time and space)?


Solution

  • The algorithm you've implemented is called counting sort. Its runtime is O(n + U), where n is the total number of elements and U is the maximum value in the array (assuming the numbers go from 0 to U), and its space usage is Θ(U). Your particular implementation assumes that U = 10,000. Although you've described your approach as "hashing," this really isn't a hash (computing some function of the elements and using that to put them into buckets) as a distribution (spreading elements around according to their values).

    If U is a fixed constant - as it is in your case - then the runtime is O(n) and the space usage is O(1), though remember that big-O talks about long-term growth rates and that if U is large the runtime can be pretty high. This makes it attractive if you're sorting very large arrays with a restricted range of values. However, if the range of values can be large, this is not a particularly good approach. Interestingly, you can think of radix sort as an algorithm that repeatedly runs counting sort with U = 10 (if using the base-10 digits of the numbers) or U = 2 (if going in binary) and has a runtime of O(n log U), which is strongly preferable for large values of U.

    You can clean up this code in a number of ways. For example, you have an if statement and a while loop with the same condition, which can be combined together into a single while loop. You also might want to put in some assert checks to make sure all the values are in the range from 0 to 9,999, inclusive, since otherwise you'll have a bounds error. Additionally, you could consider making the global array either a local variable (though watch your stack usage) or a static local variable (to avoid polluting the global namespace). You could alternatively have the user pass in a parameter specifying the maximum size or could calculate it yourself.