Search code examples
pythonarraysnumpysplitdbscan

Numpy split array by grouping array


There are the following 2 arrays with equal length. My goal is to split the array B into groups defined by the array A. So finally there should be 3 arrays or an list of array. The final list of arrays should consists of the following rows of array B:

  • First and second
  • Third and fifth
  • Fourth

The order is not really relevant.

A = array([[-1],
           [ 1],
           [ 0],
           [ 0],
           [ 1]])

B = array([[ 624.5   ,  548.    ],
           [ 912.8201,  564.3444],
           [1564.5   ,  764.    ],
           [1463.4163,  785.9251],
           [1698.0757,  846.6306]])

The problem occured to me by using the dbscan clustering function. The A array describes the clusters (0, 1) of the points in array B. The values -1 declares the point as outlier. (The values used are not precise). My goal is to calculate the compactness, ... of each found cluster


Solution

  • The numpy_indexed package (disclaimer: i am its author) was designed with these type of use cases in mind.

    import numpy_indexed as npi
    C = npi.group_by(A).split(B)
    

    Not sure what you mean by compactness of each group; but rather than splitting and doing subsequent computations, it is typically more efficient to compute reductions over groups directly; whereby you can reuse the grouping object for increased efficiency:

    groups = npi.group_by(A)
    mean = groups.mean(B)
    std = groups.std(B)