Search code examples
javascripttypescriptstatisticsbenchmarking

Calculate standard deviation (σ) from the previous one and a new element (cumulative/incremental SD)


A running process is being benchmarked, where the following numbers are calculated "cumulatively" (from previous values + the new element):

  • Minimum duration
  • Maximum duration
  • Average duration

The Standard Deviation (σ) of the duration is supposed to be calculated as well, because it indicates the statistical dispersion.

Based on the application, it is inconvenient to store each and every element/number, so it is necessary to calculate it with respect to the previous value and the new element.

Example:

NewElement  Min  Max  Avg  StdDev           // AllElements (which are NOT meant to be stored)
1           1    1    1    -                   [1]
2           1    2    1.5  0.5                 [1,2]
3           1    3    2    0.8164965809277     [1,2,3]
4           1    4    2.5  1.1180339887499     [1,2,3,4]
0           0    4    2    1.4142135623731     [1,2,3,4,0]

(And this is an online calculator of SD, for reference values)

A simplified version of the goal would be:

const calculateNewStats = (stats, newElement) => {
  const newStats = {};
  newStats.count = stats.count + 1;
  newStats.min = Math.min(stats.min, newElement);
  newStats.max = Math.max(stats.max, newElement);
  newStats.avg = (stats.avg * stats.count + newElement) / newStats.count;

  // newStats.sd = ??? that's the problem

  return newStats;
};

// initial values
let stats = {
  count: 0,
  min: 0,
  max: 0,
  avg: 0,
  // initial SD is theoretically controversial (N/A), but that's not the point
  sd: 0,
};

// loopStart goes here ... an infinite one

    // many things goes here ... eventually, we have a `newElement`

    stats = calculateNewStats(stats, newElement);

// loopEnd goes here

A search has been made for some time, some mathematical equations (like this) were found and carefully applied but the resulted numbers were not correct.


Solution

  • The algorithm on the page you linked does work, here is a working implementation:

    const calculateNewStats = (stats, newElement) => {
      const newStats = {};
    
      newStats.count = stats.count + 1;
      newStats.min = Math.min(stats.min, newElement);
      newStats.max = Math.max(stats.max, newElement);
      newStats.avg = (stats.avg * stats.count + newElement) / newStats.count;
    
      newStats.sd = Math.sqrt(
        (
          (newStats.count - 1) * stats.sd * stats.sd +
          (newElement - newStats.avg) * (newElement - stats.avg)
        ) / (newStats.count)
      );
    
      return newStats;
    };
    
    // initial values
    let stats = {
      count: 0,
      min: 0,
      max: 0,
      avg: 0,
      sd: 0
    };
    
    let newElements = [1, 2, 3, 4, 0];
    
    for (let newElement of newElements) {
      stats = calculateNewStats(stats, newElement);
      console.log(stats);
    }
    

    Result on JSBin

    Maybe you missed the last sentence?

    If you want the population variance or standard deviation replace N-1 with N and N-2 with N-1.


    Note: there will be a small loss of precision that will get larger as you add elements. I would advise to:

    • store the variance in stats together with sd; right now I'm calculating the square root of the variance to get the SD, then squaring the SD to get the variance in the next iteration
    • store the total value in stats, instead of recalculating it with stats.avg * stats.count on every iteration

    You're storing 2 more numbers in stats, but you should be getting a better precision in your numbers.

    This is a better implementation:

    const calculateNewStats = (stats, newElement) => {
      const newStats = {};
      newStats.count = stats.count + 1;
      newStats.total = stats.total + newElement;
      newStats.min = Math.min(stats.min, newElement);
      newStats.max = Math.max(stats.max, newElement);
      newStats.avg = (stats.total + newElement) / newStats.count;
    
      newStats.variance = (
        (newStats.count - 1) * stats.variance +
        (newElement - newStats.avg) * (newElement - stats.avg)
      ) / (newStats.count);
    
      newStats.sd = Math.sqrt(newStats.variance);
    
      return newStats;
    };
    
    // initial values
    let stats = {
      count: 0,
      total: 0,
      min: 0,
      max: 0,
      avg: 0,
      sd: 0,
      variance: 0
    };
    
    let newElements = [1, 2, 3, 4, 0];
    
    for (let newElement of newElements) {
      stats = calculateNewStats(stats, newElement);
      console.log(stats);
    }
    

    JSBin