Search code examples
javapythonscalanumpybucket

Scala: equivalent of np.digitize to bucketize data


With scala I have some data, for ex.

val values = Seq(0, 2, 10, 50)

And I defined buckets, eg.

val buckets = TreeMap[Int, Double]((0, -0.001),
                                   (1, 1.5),
                                   (2, 5),
                                   (3, 20)

Then I want, from my values, to have the buckets indexes, eg.

val result = Seq(0, 1, 2, 3)

In python this can be done with np.digitize, in scala I can't find an equivalent with Nd4j or Breeze.

Is there an optimized solution to this?


Solution

  • Maybe you are using an older Breeze version. If you include:

    libraryDependencies += "org.scalanlp" %% "breeze" % "0.13.2"
    

    in your .sbt file you can use that function. For example:

    import breeze.stats._
    
    val arr1 = Array(-3, 0.5, 1, 1.5, 4)
    val arr2 = Array(0, 1, 2)
    
    digitize(arr1, arr2)
    

    It gives

    Array(0, 1, 1, 2, 3)