Search code examples
c++sortingcounting-sort

Counting sort : Why we need sum of previous counts?


I was reading about counting sort and one of the steps of algorithm is

Counting sort

for(i = 1 to k) 
   c[i] = c[i]+c[i-1];

Why do we need this step?

Why can't we use this instead

for(i = 0 to k)
    while(c[i]--)
        Output i/Use i as key.

One problem I thought of was if we wanted data itself (probably like a string which referred to particular index).

But then , we could use a 2d Vector. What is bad in this approach where we sort data from 0 to 99.

int a[100];
for(int i = 0 ; i < 100; ++i)
    a[i] = 0;
string s;
vector<vector<string>> data(100);
int temp;
for(int i = 0 ; i < n ; ++i){
    cin>>temp;
    a[temp]++;
    getline(cin,s);
    data[temp].push_back(s);
}

for(int i = 0 ; i < 100; ++i){
    int current = 0;
    while(a[i]--){
        cout<<data[i][current]<<" ";
        ++current;
    }
}

Solution

  • There are two cases in which you might want to use counting sort. First, you may have an array of actual numbers that you want to sort, and the goal is to sort those numbers. Second, you may have an array of values to sort, each of which is sorted according to some key that's a number in a fixed range.

    In the first case - where you're purely sorting numbers - there's no reason why you need to convert the histogram into a cumulative histogram. Since numbers are just numbers, you can sort the array not by rearranging the initial values into sorted order, but just by generating a new list of numbers based on a frequency histogram. For example, here's how you might do that:

    /* Initialize histogram. */
    const unsigned kMaxValue = 100;
    std::vector<unsigned> counts(kMaxValue);
    
    /* Read values. */
    const unsigned kNumValues = 100;
    for (unsigned i = 0; i < kNumValues; i++) {
        unsigned nextValue;
        std::cin >> nextValue; // Don't forget error-checking!
    
        counts[nextValue]++;
    }
    
    /* Output values. */
    std::vector<unsigned> result;
    result.reserve(kNumValues);
    for (unsigned i = 0; i < counts.size(); i++) {
        for (unsigned j = 0; j < counts[i]; j++) {
            result.push_back(i);
        }
    }
    

    Notice that the numbers that are added to the result vector aren't read from the input, but are instead generated just by using the loop counter.

    In the second case - where you have elements to sort and each element has a key - it's not possible to use the above approach because you can't regenerate the elements just by counting. Instead, you're going to need to do something more clever that actually involves rearranging the elements from the input sequence.

    This is where the frequency histogram idea comes in. The basic idea is the following - we want to determine, for each element in the input array, the index at which that element should end up in the sorted array. Let's imagine that we start by getting the frequency histogram H for the input array. This histogram has the property that H[i] tells us how many different elements there are with key i. Now, suppose we make a cumulative frequency histogram C where C[i] = C[0] + C[1] + ... + C[i]. In this case, C[i] tells us how many elements in the input array have key less than or equal to it.

    Imagine that you have just the input array and the cumulative frequency histogram. What could you do with it? Well, suppose that you have some element A[i] from the original array. Based on the cumulative frequency histogram, we know that there are C[i] elements in the array whose key is less than or equal to A[i]. Therefore, if we wanted to reorder the input array so that everything was in sorted order, we could put element A[i] at position C[key(A[i])] - 1, since there are C[key(A[i])] - 1 elements less than or equal to it. Assuming that there are no duplicated values in the array, iterating across the array A and repositioning everything according to this formula would correctly put the array in sorted order.

    Things are a bit more complicated if we have duplicates. Suppose there are two elements A[i] and A[j] where key(A[i]) = key(A[j]). In that case, we can't put both elements at position C[key(A[i])] - 1, since they'd collide. However, we could do the following. We'll put one of the elements at position C[key(A[i])] - 1, then destructively modify the C array by subtracting 1 from C[key(A[i])]. Then, when we see element A[j], we'll put it at position C[key(A[j])] - 1, which is an empty slot. Intuitively, the whole idea of having a cumulative frequency histogram is to be able to instantly know where to position objects by storing how many objects will come before any particular item with a given key. Every time we see an item with some key, we want to indicate that for any future item with the same key, there will be one fewer item coming before it.

    So why scan backwards? We could just as easily have done a forward scan, but the advantage of a backwards scan is that it sorts elements stably. That is, if you have multiple elements with the same key, they'll all appear in the same relative order in the output sequence as they would in the input sequence.

    Here's some code showing off how you can use this:

    /* Initialize histogram. */
    const unsigned kMaxValue = 100;
    std::vector<unsigned> counts(kMaxValue);
    
    /* Read values. Each value will be a string with an associated key. */
    const unsigned kNumValues = 100;
    std::vector<std::pair<unsigned, std::string>> elems;
    elems.reserve(kNumValues);
    
    for (unsigned i = 0; i < kNumValues; i++) {
        unsigned key;
        std::cin >> key; // Don't forget error-checking!
    
        std::string value;
        std::cin >> value;       // Don't forget error-checking!
        elems.push_back(std::make_pair<key, value>);
    
        counts[key]++;
    }
    
    /* Transform histogram into cumulative histogram. */
    for (unsigned i = 1; i < counts.size(); i++) {
       counts[i] += counts[i - 1];
    }
    
    /* Output values. */
    std::vector<unsigned> result(kNumValues);
    for (unsigned i = counts.size() - 1; i >= 0 && i < counts.size(); i--) { // See note
        result[--counts[elems[i].first]] = elems[i].second;
    }
    

    The loop is a bit weird due to counting down with unsigned values. This question details how to handle that properly.