Search code examples
c++algorithmsortingradix-sort

Hollerith's Radix Sort


I have implemented the normal Radix Sort:

#include <iostream>

using namespace std;

void print(int arr[], int n) {
    for (int i = 0; i < n; i++) {
        cout << arr[i] << " ";
    }
    cout << endl;
}

int findMax(int arr[], int n) {
    int mx = 0;
    for (int i = 0; i < n; i++) {
        if (arr[i] > mx)
            mx = arr[i];
    }
    return mx;
}

void countingSort(int arr[], int n, int exp) {
    int output[n];
    const int m = findMax(arr, n) + 1;
    int C[m];
    for (int i = 0; i < m; i++) {
        C[i] = 0;
    }

    for (int i = 0; i < n; i++)
        C[(arr[i] / exp) % 10]++;

    for (int i = 1; i < 10; i++)
        C[i] += C[i - 1];

    for (int i = n - 1; i >= 0; i--) {
        output[C[(arr[i] / exp) % 10] - 1] = arr[i];
        C[(arr[i] / exp) % 10]--;
    }

    for (int i = 0; i < n; i++)
        arr[i] = output[i];
}

void radixSotr(int arr[], int n) {
    int m = findMax(arr, n);
    for (int exp = 1; m / exp > 0; exp *= 10) {
        countingSort(arr, n, exp);
    }
}

int main() {
    int n;
    cout << "Enter the number of elements: ";
    cin >> n;
    int arr[n];
    cout << "Enter the elements of the array: ";
    for (int i = 0; i < n; i++) {
        cin >> arr[i];
    }
    cout << endl;
    cout << "Unsorted version of the array: " << endl;
    print(arr, n);
    cout << endl;
    cout << "Sorted version of the array: " << endl;
    radixSotr(arr, n);
    print(arr, n);

    return 0;
}

Now I am trying to implement Hollerith's version of Radix Sort, where the Radix Sort starts with the most significant bit and propagates iteratively to the least significant bit. Could you give me any ideas how to modify my code, because I am stuck.


Solution

  • Your countingSort function has a problem:

    • you should use an array of 10 indexes for counting instead of finding the largest element and declaring int C[m]. Your current code allocates a potentially huge array in automatic storage, invoking undefined behavior.

    Here is a corrected version:

    void countingSort(int arr[], int n, int exp) {
        int output[n];
        int C[10] = { 0 };
    
        for (int i = 0; i < n; i++)
            C[(arr[i] / exp) % 10]++;
    
        for (int i = 1; i < 10; i++)
            C[i] += C[i - 1];
    
        for (int i = n - 1; i >= 0; i--) {
            output[--C[(arr[i] / exp) % 10]] = arr[i];
            C[(arr[i] / exp) % 10]--;
        }
    
        for (int i = 0; i < n; i++)
            arr[i] = output[i];
    }
    

    Note that this algorithm cannot sort an array with negative numbers.

    The Hollerith algorithm uses least significant digit to most significant digit. It was invented for sorting US Census data tabulated on punched cards using tabulating machines. This is a very early example of computing for data processing that goes back to 1887. Punch cards used 2 different character encoding schemes named H-code and T-code all the way to the end of the 20th century, H standing for Herman Hollerith, inventor of these sorting machines, who died in 1929. (see http://ed-thelen.org/comp-hist/Knuth-Sort.html )

    For the most significant bit down to the least significant bit, you need recursion, not an iterative method like the one you have:

    • Find the maximum value, hence the maximum exponent to get the most significant digit.
    • Sort the array according to the current digit
    • For each bucket of elements with the same digit at the current position:
      • if the bucket is empty or has only one element, it is sorted
      • otherwise, recurse on the bucket for the next lesser digit, using exp/10.

    You can do this with any base >= 2.