Given a list containing k
sublists, let
A={{a1,a2,a3,...},{b1,b2,b3,...},...}
, I want to sort those sublists by their Total[A[i]]
. Is there any efficient way to make this?
Note that in many cases, sorting based on Ordering
can be faster than SortBy
, because it allows us to leverage vectorization. In this particular case, the speed-up is not that huge:
In[50]:= test = RandomInteger[10,{5000000,5}];
In[54]:= (res1=SortBy[test,{Total}]);//Timing
(res2 = test[[Ordering[Total[test,{2}]]]]);//Timing
res1===res2
Out[54]= {1.422,Null}
Out[55]= {1.125,Null}
Out[56]= True
But, this is because Total
is a built-in function. The whole reason why SortBy
was introduced, is efficiency (that is, for a single comparison function. For several comparison functions as tie-breakers, also convenience). It is more efficient that Sort
, because it is more specific, and therefore by-passes more steps in the main evaluation sequence. However, there is no way for SortBy
to leverage the possible listability (vectorized nature) of the function on which the sorting is based - it is applied to list elements one-by-one. The solution with ordering does explicitly leverage the possibility of whole-sale computation of the sorting function (in this case, Total[#,{2}]&
does this), and therefore is faster.
If, for example, the task would be to sort according to a total of the second, third and fourth element in each sublist, we'd see a larger performance difference:
In[60]:= (res3=SortBy[test,{Total[#[[2;;4]]]&}]);//Timing
(res4=test[[Ordering[Total[test[[All,2;;4]],{2}]]]]);//Timing
res3==res4
Out[60]= {2.39,Null}
Out[61]= {1.11,Null}
Out[62]= True
Generally, the performance boost will be the largest for sorting functions which are both computationally intensive and vectorized, and such that they are much faster when applied to the whole list. Note, however, that the performance boost for sorting will never be as large as for the sorting function itself, for large lists. This is because of the intrinsic complexity of sorting, which is proportional to n*Log[n]
for large lists of length n
, and this complexity will always be there.