Search code examples
c#listmath.netmathnet-numerics

Best way to use Math.NET statistics functions on the properties of objects in a List


I'm trying to figure out the best way to perform a computation fast and wanted to find out what sort of approach people would usually take in a situation like this.

I have a List of objects which have properties that I want to compute the mean and standard deviation of. I thought using this Math.NET library would probably be easier/optimised for performance.

Unfortunately, the input arguments for these functions are arrays. Is my only solution to write my own function to compute means and STDs? Could I write some sort of extension method for lists that uses lambda functions like here? Or am I better off writing functions that return arrays of my object properties and use these with Math.NET.

Presumably the answer depends on some things like the size of the list? Let's say for argument's sake that the list has 50 elements. My concern is purely performance.


Solution

  • ArrayStatistics indeed expects arrays as it is optimized for this special case (that's why it is called ArrayStatistics). Similarly, StreamingStatistics is optimized for IEnumerable sequence streaming without keeping data in memory. The general class that works with all kind of input is the Statistics class.

    Have you verified that simply using LINQ and StreamingStatistics is not fast enough in your use case? Computing these statistics for a list of merely 50 entries is barely measurable at all, unless say you do that a million times in a loop.

    Example with Math.NET Numerics v3.0.0-alpha7, using Tuples in a list to emulate your custom types:

    using MathNet.Numerics.Statistics;
    
    var data = new List<Tuple<string, double>>
    {
        Tuple.Create("A", 1.0),
        Tuple.Create("B", 2.0),
        Tuple.Create("C", 1.5)
    };
    
    // using the normal extension methods within `Statistics`
    var stdDev1 = data.Select(x => x.Item2).StandardDeviation();
    var mean1 = data.Select(x => x.Item2).Mean();
    
    // single pass variant (unfortunately there's no single pass MeanStdDev yet):
    var meanVar2 = data.Select(x => x.Item2).MeanVariance();
    var mean2 = meanVar2.Item1;
    var stdDev2 = Math.Sqrt(meanVar2.Item2);
    
    // directly using the `StreamingStatistics` class:
    StreamingStatistics.MeanVariance(data.Select(x => x.Item2));