Search code examples
pythonscalascala-breeze

python numpy std vs scala breeze stddev


I'm currently working on migrating some python code to scala. I'm using breeze lib as a substitution for numpy.

Everything looks fine, but I faced different behaviour in output of standard deviation implementations:

Python:

series = np.array([1,4,5])
np.mean(series) // 3.3333333333333335
np.std(series) // 1.699673171197595

Scala:

val vector = breeze.linalg.Vector[Double](Array(1.0, 4.0, 5.0))
val mean = breeze.stats.mean(vector) // 3.3333333333333335
val std = breeze.stats.stddev(vector) // 2.081665999466133

I know how to reproduce python's behaviour in plain scala. Sample code is presented here: Scala: What is the generic way to calculate standard deviation

But I'm looking for a way to get it with breeze. Any ideas?


Solution

  • This is related to the number of degrees of freedom. Indeed,

    >>> np.std(series, ddof=1)
    2.081665999466133
    

    Which is the sample std. With breeze, something you can do to get the population std is

    var n   = 3
    val std = breeze.stats.stddev(vector)*Math.pow((n-1)/n, .5)
    # 1.6996731711975948