With openmp in C, how can I parallelize a for loop that contains a nested comparison function for qsort?

I want to parallelize a for loop which contains a nested comparison function for qsort:

#include    <stdio.h>
#include    <stdlib.h>
#include    <omp.h>

int main(){
    int i;
#pragma omp parallel for
    for(i = 0; i < 100; i++){
        int *index= (int *) malloc(sizeof(int)*10);
        double *tmp_array = (double*) malloc(sizeof(double)*10);
        int j;
        for(j=0; j<10; j++){
            tmp_array[j] = rand();
            index[j] = j;
        }
        // QuickSort the index array based on tmp_array:
        int simcmp(const void *a, const void *b){
            int ia = *(int *)a;
            int ib = *(int *)b;
            if ((tmp_array[ia] - tmp_array[ib]) > 1e-12){
                return -1;
            }else{
                return 1;
            }
        }
        qsort(index, 10, sizeof(*index), simcmp);
        free(index);
        free(tmp_array);
    }
    return 0;
}

When I try to compile this, I get the error:

internal compiler error: in get_expr_operands, at tree-ssa-operands.c:881
 }

As far as I can tell, this error is due to the nested comparison function. Is there a way to make openmp work with this nested comparison function? If not, is there a good way to achieve a similar result without a nested comparison function?

Edit: I'm using GNU C compiler where nested functions are permitted. The code compiles and runs fine without the pragma statement. I can't define simcmp outside of the for loop because tmp_array would then have to be a global variable, which would mess up the multi-threading. However, if somebody has a suggestion to achieve the same result without a nested function, that would be most welcome.

Solution

I realize this has been self answered, but here are some standard C and OpenMP options. The qsort_r function is a good classic choice, but it's worth noting that qsort_s is part of the c11 standard, and thus is portable wherever c11 is offered (which does not include Windows, they don't quite offer c99 yet).

As to doing it in OpenMP without the nested comparison function, still using original qsort, there are two ways that come to mind. First is to use the classic global variable in combination with OpenMP threadprivate:

static int *index = NULL;
static double *tmp_array = NULL;

#pragma omp threadprivate(index, tmp_array)

int simcmp(const void *a, const void *b){
    int ia = *(int *)a;
    int ib = *(int *)b;
    double aa = ((double *)tmp_array)[ia];
    double bb = ((double *)tmp_array)[ib];
    if ((aa - bb) > 1e-12){
        return -1;
    }else{
        return 1;
    }
}

int main(){
    int i;
#pragma omp parallel for
    for(i = 0; i < 100; i++){
        index= (int *) malloc(sizeof(int)*10);
        tmp_array = (double*) malloc(sizeof(double)*10);
        int j;
        for(j=0; j<10; j++){
            tmp_array[j] = rand();
            index[j] = j;
        }
        // QuickSort the index array based on tmp_array:
        qsort_r(index, 10, sizeof(*index), simcmp, tmp_array);
        free(index);
        free(tmp_array);
    }
    return 0;
}

The version above causes every thread in the parallel region to use a private copy of the global variables index and tmp_array, which takes care of the issue. This is probably the most portable version you can write in standard C and OpenMP, with the only likely incompatible platforms being those that do not implement thread local memory (some microcontrollers, etc.).

If you want to avoid the global variable and still have portability and use OpenMP, then I would recommend using C++11 and the std::sort algorithm with a lambda:

std::sort(index, index+10, [=](const int& a,  const int& b){
    if ((tmp_array[a] - tmp_array[b]) > 1e-12){
        return -1;
    }else{
        return 1;
    }
});