c algorithm sorting data-structures quicksort

Why am I getting segmentation fault when reversed sorted array is given as input?

I am testing it for various distributions and I am getting segmentation fault when reversed (descending) sorted array is given as input. Sometimes it works well even for reversed sorted array and sometimes I am getting segmentation fault error particularly on large reversed sorted array (>100000). Is it because of very deep recursive call if so then what is the limit of recursive call depth, on what factors does it depend and what precautions need to be taken care while writing recursive programs.

int partation(double *a, int lower, int upper) {
    int i = lower, j = upper;
    double t = a[lower], temp;
    while (i <= j) {
        while (i <= upper && a[i] <= t)
            i++;
        while (a[j] > t)
            j--;
        if (i <= j) {
            temp = a[i];
            a[i] = a[j];
            a[j] = temp;
        }
    }
    temp = a[lower];
    a[lower] = a[j];
    a[j] = temp;
    return j;
}

void quick_sort(double *a, int lower, int upper) {
    int j;
    if (lower >= upper)
        return;
    else {
        j = partation(a, lower, upper);
        quick_sort(a, lower, j - 1);
        quick_sort(a, j + 1, upper);
    }
}

Solution

You have a segmentation fault because of a stack overflow.

A reverse sorted array is the worst case for your partition scheme: you recurse n-1 times on an array of n elements.

You can try and improve the partition scheme to avoid this pathological case, but there is a way to avoid deep recursion in the quick_sort function: on recurse on the smaller half and loop on the larger one. The pathological case will just be very slow but no longer crash.

Here is the code:

#include <stdio.h>
#include <time.h>

int partition(double *a, int lower, int upper) {
    /* assuming lower < upper */
    int i = lower, j = upper;
    double t = a[lower], temp;
    while (i <= j) {
        while (i <= upper && a[i] <= t)
            i++;
        while (a[j] > t)
            j--;
        if (i < j) {
            temp = a[i];
            a[i] = a[j];
            a[j] = temp;
        }
    }
    temp = a[lower];
    a[lower] = a[j];
    a[j] = temp;
    return j;
}

void quick_sort(double *a, int lower, int upper) {
    while (lower < upper) {
        int j = partition(a, lower, upper);
        if (j - lower < upper - j) {
            if (lower < j - 1) {
                quick_sort(a, lower, j - 1);
            }
            lower = j + 1;
        } else {
            if (j + 1 < upper) {
                quick_sort(a, j + 1, upper);
            }
            upper = j - 1;
        }
    }
}

int main(void) {
    double a[65536];
    for (int n = 2; n <= (int)(sizeof(a) / sizeof(*a)); n += n) {
        for (int i = 0; i < n; i++) {
            a[i] = n - i;
        }
        clock_t t = clock();
        quick_sort(a, 0, n - 1);
        t = clock() - t;
        for (int i = 1; i < n; i++) {
            if (a[i - 1] > a[i]) {
                printf("out of order at %d: %f > %f\n", i - 1, a[i - 1], a[i]);
                return 1;
            }
        }
        printf("a[%d] sorted in %.3f msec\n", n, (double)t * 1000.0 / CLOCKS_PER_SEC);
    }
    return 0;
}

Output:

a[2] sorted in 0.002 msec
a[4] sorted in 0.001 msec
a[8] sorted in 0.001 msec
a[16] sorted in 0.001 msec
a[32] sorted in 0.000 msec
a[64] sorted in 0.002 msec
a[128] sorted in 0.006 msec
a[256] sorted in 0.021 msec
a[512] sorted in 0.074 msec
a[1024] sorted in 0.287 msec
a[2048] sorted in 1.185 msec
a[4096] sorted in 5.104 msec
a[8192] sorted in 19.497 msec
a[16384] sorted in 78.140 msec
a[32768] sorted in 297.297 msec
a[65536] sorted in 1175.032 msec

Regarding your second question, what factors does it depend on and what precautions need to be taken while writing recursive programs? there is no definitive answer:

The limits on stack space (if there is such a concept) are implementation specific, different systems have very different limits: from just a few kilobytes on small embedded systems to several megabytes on large systems.
Recursing thousands of times is risky, and so is defining large arrays with automatic storage. In the above code, I define an array of 64K doubles, it is not a problem on modern general purpose computers, but could be too much on a small digital sensor. The maximum recursion level with the recurse on smaller half approach is bounded by log(N), which is only 16 for N=65536, should be OK for all systems.