I have a question about quick sort algorithm. I implement quick sort algorithm and play it.
The elements in initial unsorted array are random numbers chosen from certain range.
I find the range of random number effects the running time. For example, the running time for 1, 000, 000 random number chosen from the range (1 - 2000) takes 40 seconds. While it takes 9 seconds if the 1,000,000 number chosen from the range (1 - 10,000).
But I do not know how to explain it. In class, we talk about the pivot value can effect the depth of recursion tree.
For my implementation, the last value of the array is chosen as pivot value. I do not use randomized scheme to select pivot value.
int partition( vector<int> &vec, int p, int r) {
int x = vec[r];
int i = (p-1);
int j = p;
while(1) {
if (vec[j] <= x){
i = (i+1);
int temp = vec[j];
vec[j] = vec[i];
vec[i] = temp;
}
j=j+1;
if (j==r)
break;
}
int temp = vec[i+1];
vec[i+1] = vec[r];
vec[r] = temp;
return i+1;
}
void quicksort ( vector<int> &vec, int p, int r) {
if (p<r){
int q = partition(vec, p, r);
quicksort(vec, p, q-1);
quicksort(vec, q+1, r);
}
}
void random_generator(int num, int * array) {
srand((unsigned)time(0));
int random_integer;
for(int index=0; index< num; index++){
random_integer = (rand()%10000)+1;
*(array+index) = random_integer;
}
}
int main() {
int array_size = 1000000;
int input_array[array_size];
random_generator(array_size, input_array);
vector<int> vec(input_array, input_array+array_size);
clock_t t1, t2;
t1 = clock();
quicksort(vec, 0, (array_size - 1)); // call quick sort
int length = vec.size();
t2 = clock();
float diff = ((float)t2 - (float)t1);
cout << diff << endl;
cout << diff/CLOCKS_PER_SEC <<endl;
}
Most likely it's not performing well because quicksort doesn't handle lots of duplicates very well and may still result in swapping them (order of key-equal elements isn't guaranteed to be preserved). You'll notice that the number of duplicates per number is 100 for 10000 or 500 for 2000, while the time factor is also approximately a factor of 5.
Have you averaged the runtimes over at least 5-10 runs at each size to give it a fair shot of getting a good starting pivot?
As a comparison have you checked to see how std::sort and std::stable_sort also perform on the same data sets?
Finally for this distribution of data (unless this is a quicksort exercise) I think counting sort would be much better - 40K memory to store the counts and it runs in O(n).