Search code examples
performancealgorithmwolfram-mathematicabinbinning

Mathematica fast 2D binning algorithm


I am having some trouble developing a suitably fast binning algorithm in Mathematica. I have a large (~100k elements) data set of the form T={{x1,y1,z1},{x2,y2,z2},....} and I want to bin it into a 2D array of around 100x100 bins, with the bin value being given by the sum of the Z values that fall into each bin.

Currently I am iterating through each element of the table, using Select to pick out which bin it is supposed to be in based on lists of bin boundaries, and adding the z value to a list of values occupying that bin. At the end I map Total onto the list of bins, summing their contents (I do this because I sometimes want to do other things, like maximize).

I have tried using Gather and other such functions to do this but the above method was ridiculously faster, though perhaps I am using Gather poorly. Anyway It still takes a few minutes to do the sorting by my method and I feel like Mathematica can do better. Does anyone have a nice efficient algorithm handy?


Solution

  • Here is a method based on Szabolcs's post that is about about an order of magnitude faster.

    data = RandomReal[5, {500000, 3}];
    (*500k values*)
    zvalues = data[[All, 3]];
    
    epsilon = 1*^-10;(*prevent 101 index*)
    (*rescale and round (x,y) coordinates to index pairs in the 1..100 range*)
    indexes = 1 + Floor[(1 - epsilon) 100 Rescale[data[[All, {1, 2}]]]];
    
    res2 = Module[{gb = GatherBy[Transpose[{indexes, zvalues}], First]}, 
        SparseArray[
         gb[[All, 1, 1]] -> 
          Total[gb[[All, All, 2]], {2}]]]; // AbsoluteTiming
    

    Gives about {2.012217, Null}

    AbsoluteTiming[
     System`SetSystemOptions[ 
      "SparseArrayOptions" -> {"TreatRepeatedEntries" -> 1}];
     res3 = SparseArray[indexes -> zvalues];
     System`SetSystemOptions[ 
      "SparseArrayOptions" -> {"TreatRepeatedEntries" -> 0}];
     ]
    

    Gives about {0.195228, Null}

    res3 == res2
    True
    

    "TreatRepeatedEntries" -> 1 adds duplicate positions up.