Search code examples
listwolfram-mathematicasortingmathematica-8

How to sort N lists by the total sum of elements of each list with Mathematica?


Given a list containing k sublists, let A={{a1,a2,a3,...},{b1,b2,b3,...},...}, I want to sort those sublists by their Total[A[i]]. Is there any efficient way to make this?


Solution

  • Note that in many cases, sorting based on Ordering can be faster than SortBy, because it allows us to leverage vectorization. In this particular case, the speed-up is not that huge:

    In[50]:= test = RandomInteger[10,{5000000,5}];
    
    In[54]:= (res1=SortBy[test,{Total}]);//Timing
    (res2 = test[[Ordering[Total[test,{2}]]]]);//Timing
    res1===res2
    
    Out[54]= {1.422,Null}
    Out[55]= {1.125,Null}
    Out[56]= True
    

    But, this is because Total is a built-in function. The whole reason why SortBy was introduced, is efficiency (that is, for a single comparison function. For several comparison functions as tie-breakers, also convenience). It is more efficient that Sort, because it is more specific, and therefore by-passes more steps in the main evaluation sequence. However, there is no way for SortBy to leverage the possible listability (vectorized nature) of the function on which the sorting is based - it is applied to list elements one-by-one. The solution with ordering does explicitly leverage the possibility of whole-sale computation of the sorting function (in this case, Total[#,{2}]& does this), and therefore is faster.

    If, for example, the task would be to sort according to a total of the second, third and fourth element in each sublist, we'd see a larger performance difference:

    In[60]:= (res3=SortBy[test,{Total[#[[2;;4]]]&}]);//Timing
    (res4=test[[Ordering[Total[test[[All,2;;4]],{2}]]]]);//Timing
    res3==res4
    
    Out[60]= {2.39,Null}
    Out[61]= {1.11,Null}
    Out[62]= True
    

    Generally, the performance boost will be the largest for sorting functions which are both computationally intensive and vectorized, and such that they are much faster when applied to the whole list. Note, however, that the performance boost for sorting will never be as large as for the sorting function itself, for large lists. This is because of the intrinsic complexity of sorting, which is proportional to n*Log[n] for large lists of length n, and this complexity will always be there.