I know that std::sort
has a very high performance, as far as I know it uses Introsort
(quickSort
+insertionSort
+heapSort
), but in my tests I found that: "sorting an ascending array (1~99999) with std::sort()
is faster than just using for loops 100,000 times". Although std::sort
is fast, at least it needs to traverse the entire array. I think this is not possible (std::sort is faster than just for loops with the same number of loops and array lengths). I am very confused, who can tell me what is the principle.
It's hard to understand only in MacOS
, I also tested it in Linux (Centos 7.6
) and the results are expected.I want to know what Mac and Xcode did to it.
Environment:
Test Code:
#include <iostream>
#include <sys/time.h>
#define LENGTH 100000
int * order_arr(int lo, int hi, int reverse) {
int *arr=(int *)malloc(hi<<2);
if (reverse==0) {
for (int i = lo; i < hi; ++i) {
arr[i]=i;
}
return arr;
}else{
for (int i = lo; i < hi; ++i) {
arr[i]=hi-1-i;
}
return arr;
}
}
int main(int argc, const char * argv[]) {
// ---- Create an ascending array: 0~99999
int * order_array = order_arr(0, LENGTH, 0);
//------------------------------------------------------------------
timeval starttime,endtime;
gettimeofday(&starttime,0);
//----------------------------------------------------------------------start_time
// ---- STL sort
// std::sort(order_array, order_array+LENGTH);
// ---- Only for loop 100000 times
// for (int i = 0; i < LENGTH; ++i) ;
//----------------------------------------------------------------------end_time
gettimeofday(&endtime,0);
double timeuse = 1000000*(endtime.tv_sec - starttime.tv_sec) + endtime.tv_usec - starttime.tv_usec;
std::cout<< (timeuse/=1000000) <<std::endl;
return 0;
}
Running results:
MacOS(Xcode):Unreasonable, with or without optimization, std::sort() sorts the array, this time should not be less than only for loop(without optimization 0.000203 s).
optimization: clang++ test.cpp -std=c++11 -o -O3 test
for (int i=0; i<LENGTH; ++i) ;
: 0 sstd::sort(order_array, order_array+LENGTH);
:0.000118 sNo optimization:clang++ test.cpp -std=c++11 -o test
for (int i=0; i<LENGTH; ++i) ;
: 0.000203 sstd::sort(order_array, order_array+LENGTH);
:0.000123 sCentos7.6(g++):reasonable
optimization:clang++ test.cpp -std=c++11 -o -O3 test
for (int i=0; i<LENGTH; ++i) ;
:0 sstd::sort(order_array, order_array+LENGTH);
:0.001654 sNo optimization:clang++ test.cpp -std=c++11 -o -O3 test
for (int i=0; i<LENGTH; ++i) ;
:0.0002745 sstd::sort(order_array, order_array+LENGTH);
:0.002354 sHere is a possible explanation:
You do not use the contents of the sorted array. clang expands the initialization and the template code inline and can determine that you are discarding the array, so it does not even generate the code to sort it, resulting in faster time than the alternative where it does not discard the explicit empty loop.
Try and make main()
return the first element of the array to see if it makes a difference.
With your updated question, there seems to be no real issue:
You seem surprised by the performance of std::sort()
on an already sorted array on MacOS. It is possible that sorting is very efficient there on an already sorted array, both in increasing and decreasing order. An initial scan is used to split the array in chunks. With your dataset, the initial scan quickly yields a single chunk that is left as is or is simply reversed.
You can try and analyze the template code, which is available on both platforms either directly in the include files or in the open source libraries.