Search code examples
c#.netalgorithmextreme-optimization

What does WeightedStandardDeviation (in the ExtremeOptimization library) actually *do*?


The Extreme Optimization .NET math and statistics library offers a function called WeightedStandardDeviation, for which the documentation states:

Returns the mean of the variable with observations weighted by the specified vector.

This is clearly (erroneously) copied and pasted from the documentation for the related WeightedMean method. No additional details are provided regarding the actual algorithm.

I cannot find any evidence that there is a standard definition for a weighted standard deviation. This Math.SE answer indicates that there are in fact multiple candidate definitions.


As a concrete example, I'm seeing a pretty weird answer when I actually try to use the method.

Given the following values and weights:

values = [28, 29, 30, 31, 32, 33, ]
weights = [0.00588121386769639, 0.107841991196409, 0.374376106764772, 0.388925647336988, 0.116838473897444, 0.00613656693669066, ]

I get:

WeightedStandardDeviation(values, weights) == 58926371.6549313

This is obviously absurd; no definition of weighted standard deviation should be orders of magnitude greater than the total range encompassed by the sample values.

The Math.SE answer above cites this paper, which offers several candidate definitions for a weighted standard deviation. Using the first definition, I calculated a weighted SD of about 0.1285.

I also tried just multiplying each weight by 2^16 and rounding to the nearest integer, and treating these as "counts" for a normal (unweighted) standard deviation calculation. I obtained a value of about 0.878 this way.

So the WeightedStandardDeviation method appears to be doing something pretty different from either of these, and, moreover, something completely bizarre and obviously incorrect. Does anyone know what the actual algorithm is and/or what it's supposed to be?


Solution

  • I asked the ExtremeOptimization support team about this, and received an answer.

    The algorithm

    The algorithm comes from this paper, which provides a definition of "weighted variance" in section 5. The WeightedStandardDeviation function returns the square root of this weighted variance.

    The bug

    The behavior I noted above regarding huge output values (and probably NaN as well) was due to a bug; from the email I received:

    We found that the calculation for double contains a typo: the central sum of squares is divided by (W-1) instead of W. This would result in extremely large (or infinite) values in your example.

    This will be fixed in an update that will be released 24 April 2017.