Quick sort Vs. Merge sort Performance Analysis

Merge sort has worst case complexity of O(logN) vs Quick sort has O(N^2), so theoretically merge sort is supposed to perform better than quick sort. But I heard due to some copy overhead most of the cases quick sort outperforms merge sort. See the reference.

Then I decided to implement and test, below is my full source code in C,

Source

#include <stdio.h>
#include <time.h>

#define SZ 10000000
#define MOD 10000007
#define i64 long long int

i64 nums[SZ];

i64 L[SZ], R[SZ];

i64 seed = 0xff;
i64 srand(){
    seed = (seed + 17 * seed) % MOD;
    return seed;
}

void make(){
    for (register int i = 0; i < SZ; i++)
        nums[i] = srand() % MOD;
}

void swap(i64 *a, i64 *b){
    i64 t = *a;
    *a = *b;
    *b = t;
}

int pivote(int s, int e){

    //int p = s + srand() % (e - s + 1);
    int p = s + (e - s) / 2;
    //int p = s;
    //int p = e;

    i64 v = nums[p];
    int c = s;
    swap(nums + p, nums + e);
    for (register int i = s; i < e; i++){
        if (nums[i] < v){
            swap(nums + i, nums + c);
            c++;
        }
    }
    swap(nums + c, nums + e);
    return c;
}

void qsort(int s, int e){

    if (s < e){
        int p = pivote(s, e);
        qsort(s, p - 1);
        qsort(p + 1, e);
    }
}

void merge(i64 arr[], int l, int m, int r){
    int i, j, k;
    int n1 = m - l + 1;
    int n2 = r - m;

    for (i = 0; i < n1; i++)
        L[i] = arr[l + i];
    for (j = 0; j < n2; j++)
        R[j] = arr[m + 1 + j];

    i = 0;
    j = 0;
    k = l;
    while (i < n1 && j < n2)
    {
        if (L[i] <= R[j])
        {
            arr[k] = L[i];
            i++;
        }
        else
        {
            arr[k] = R[j];
            j++;
        }
        k++;
    }

    while (i < n1)
    {
        arr[k] = L[i];
        i++;
        k++;
    }

    while (j < n2)
    {
        arr[k] = R[j];
        j++;
        k++;
    }
}

void mergeSort(i64 arr[], int l, int r){
    if (l < r){
        int m = l + (r - l) / 2;

        mergeSort(arr, l, m);
        mergeSort(arr, m + 1, r);
        merge(arr, l, m, r);
    }
}


void testQsort(){
    double s, e;

    make();

    s = clock();
    qsort(0, SZ - 1);
    e = clock();
    printf("qsort random: %Lf ms\n", (e - s) / 1);

    s = clock();
    qsort(0, SZ - 1);
    e = clock();
    printf("qsort sorted: %Lf ms\n", (e - s) / 1);

}

void testMsort(){
    double s, e;

    make();

    s = clock();
    mergeSort(nums, 0, SZ - 1);
    e = clock();
    printf("msort random: %Lf ms\n", (e - s) / 1);

    s = clock();
    mergeSort(nums, 0, SZ - 1);
    e = clock();
    printf("msort sorted: %Lf ms\n", (e - s) / 1);
}

int main(){

    testMsort();
    testQsort();

    return 0;
}

Result for 10 million elements:

msort random: 4596.000000 ms
msort sorted: 3354.000000 ms
qsort random: 7637.000000 ms
qsort sorted: 5074.000000 ms

I have used four versions of quick sort,

pivote at the first position
pivote at the last position
pivot at the middle position
pivote at the random position

None of the versions of quick sort seems to outperform merge sort. Could anyone tell why it was mentioned that quick sort outperforms merge sort?

Is there any wrong with my quick sort implementation?

Update 1

Following the answer of @rcgldr mentioned below, I have tested the below version of the quick sort and it finally outperforms any version of the merge sort.

void qsort3(int s, int e){
    if (s < e){
        i64 p = nums[(s + e) / 2];
        int i = s - 1;
        int j = e + 1;
        while (true){
            while (nums[++i] < p);
            while (nums[--j] > p);
            if (i >= j) break;
            swap(nums + i, nums + j);
        }
        qsort3(s, j);
        qsort3(j + 1, e);
    }
}

Solution

The question's example for quick sort is based on Lomuto partition scheme, which is slower than Hoare partition scheme. Link to an example of Hoare partition scheme:

QuickSort with middle elemenet as pivot

The merge sort example is constantly creating sub-arrays and copying data. A more efficient approach is to do a one time allocation of an array, then change the direction of merge based on the level of recursion for top down merge sort, or based on the pass count for bottom up merge sort. Link to a java source code showing both bottom up and top down merge sort. This can be easily converted to c:

'MergeSort Algorithm' - What's the better implementation in JAVA?

As for the relative performance, a simple quick sort such as the one linked to in this answer is about 15% than a basic merge sort for sorting an array of simple elements like integers or floating point numbers. However, if the quick sort is enhanced to avoid worst case time complexity of O(n^2), the advantage is decreased, and the main advantage is that it doesn't require O(n) space that merge sort needs for it's merge operations. In general, merge sort does more moves but fewer compares than quicksort. If sorting an array of pointers to objects, the compare overhead becomes greater than the time it takes to move pointers, and merge sort ends up being faster. On the other hand, sorting an array of pointers to objects involves random accessing of those objects, which isn't cache friendly, and it's faster to sort the objects rather than the pointers unless the objects are fairly large (the trade off is typically around 128 to 256 bytes, depending on the system).