Search code examples
c#linqstandard-deviation

How to calculate population standard deviation foreach double in a list of Double[]?


I have got a list of double arrays as such : List<Double[]> ys

They all contain y-values from a xy-plot. I want to calculate the population standard deviation for all points of x, which in essence is for every element in each array. Example:

Take the first element of every array, calculate population standard deviation, put value in new array. Move to next element in all arrays in list and calculate population standard deviation and put in the newly created array. etc etc until we have reached the end of all the arrays.

Is there anyway I can achieve this quickly without nested for loops using linq or similar?

Example input ys = {[1, 2, 3, 4, 5], [10, 20, 30, 40, 50], [100, 200, 300, 400, 500]}

output: double[] = [44.69899328, 89.39798655, 134.0969798, 178.7959731, 223.4949664]

44.69899328 comes from: 1, 10, 100

89.39798655 comes from : 2, 20, 200

134.0969798 comes from: 3, 30, 300

178.7959731 comes from: 4, 40, 400

223.4949664 comes from: 5, 50, 500


Solution

  • For data where all sub arrays have the same length this could be:

    var stdDevs = Enumerable.Range(0, ys[0].Length)
        .Select(i => ys.Select(y => y[i]))
        .Select(StdDev); 
    

    The last part can be .Select(Z => new { Z, V = StdDev(Z) }); if you want the input values.

    Test:

    var ys = new[] { new[] { 1, 2, 3, 4, 5 }, new[] { 10, 20, 30, 40, 50 }, new[] { 100, 200, 300, 400, 500 } };
    
    var stdDevs = Enumerable.Range(0, ys[0].Length)
        .Select(i => ys.Select(y => y[i]))
        .Select(Z => new { Z, V = StdDev(Z) });
    
    foreach(var d in stdDevs)
    {
        Console.WriteLine($"Std dev for {string.Join(",", d.Z)} is {d.V}");
    }
    
    static double StdDev(IEnumerable<int> values)
    {
        // From https://stackoverflow.com/questions/3141692/standard-deviation-of-generic-list
        // by Jonathan DeMarks   
        double avg = values.Average();
        return Math.Sqrt(values.Average(v=>Math.Pow(v-avg,2)));
    }
    

    Output:

    Std dev for 1,10,100 is 44.69899327725402
    Std dev for 2,20,200 is 89.39798655450804
    Std dev for 3,30,300 is 134.09697983176207
    Std dev for 4,40,400 is 178.79597310901607
    Std dev for 5,50,500 is 223.4949663862701
    

    Different lengths

    If lengths of sub arrays are different then the version is not as pretty but still readable

    var stdDevs = Enumerable.Range(0, ys.Max( y => y.Length))
        .Select(i => ys.Where( y => i < y.Length).Select(y => y[i]))
        .Select(Z => new { Z, V = StdDev(Z) }); 
    

    If this is run with the 5 & 500 removed the result is:

    Std dev for 1,10,100 is 44.69899327725402
    Std dev for 2,20,200 is 89.39798655450804
    Std dev for 3,30,300 is 134.09697983176207
    Std dev for 4,40,400 is 178.79597310901607
    Std dev for 50 is 0