Search code examples
c#mathematical-optimizationlogarithm

Math.Log vs multiplication complexity in terms of geometric average which is better?


I want to find geometric average of data and performance does matters.
Which one should I pick between

  1. Keep multiplication over single variable and take Nth-root at the end of calculation

    X = MUL(x[i])^(1/N)  
    

    Thus, O(N) x Multiplication Complexity + O(1) x Nth-root

  2. Use logarithm

    X = e ^ { 1/N * SUM(log(x[i])) }  
    

    Thus, O(N) x Logarithm Complexity + O(1) x Nth-division + O(1) Exponential

  3. Specialized algorithm for geometric average. Please tell me if there is.


Solution

  • I thought I would try to benchmark this and get a comparison, here is my attempt.

    Comparing was difficult since the list of numbers needed to be large enough to make timing it reasonable, so N is large. In my test N = 50,000,000 elements.

    However, multiplying lots of numbers together which are greater than 1 overflows the double storing the product. But multiplying together numbers less than 1 gives a total product which is very small, and dividing by the number of elements gives zero.

    Just a couple more things: Make sure none of your elements are zero, and the Log approach doesn't work for negative elements.

    (The multiply would work without overflow if C# had a BigDecimal class with an Nth root function.)

    Anyway, in my code each element is between 1 and 1.00001

    On the other hand, the log approach had no problems with overflows, or underflows.

    Here's the code:

    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Starting...");
            Console.WriteLine("");
    
            Stopwatch watch1 = new Stopwatch();
            Stopwatch watch2 = new Stopwatch();
    
            List<double> list = getList();
    
            double prod = 1;
    
            double mean1 = -1;
    
            for (int c = 0; c < 2; c++)
            {
                watch1.Reset();
    
                watch1.Start();
    
                prod = 1;
    
                foreach (double d in list)
                {
                    prod *= d;
                }
    
                mean1 = Math.Pow(prod, 1.0 / (double)list.Count);
    
                watch1.Stop();
    
            }
    
            double mean2 = -1;
    
            for (int c = 0; c < 2; c++)
            {
                watch2.Reset();
    
                watch2.Start();
    
                double sum = 0;
    
                foreach (double d in list)
                {
                    double logged = Math.Log(d, 2);
                    sum += logged;
                }
    
                sum *= 1.0 / (double)list.Count;
    
                mean2 = Math.Pow(2.0, sum);
    
                watch2.Stop();
    
            }
            Console.WriteLine("First way gave: " + mean1);
            Console.WriteLine("Other way gave: " + mean2);
    
            Console.WriteLine("First way took: " + watch1.ElapsedMilliseconds + " milliseconds.");
            Console.WriteLine("Other way took: " + watch2.ElapsedMilliseconds + " milliseconds.");
    
            Console.WriteLine("");
            Console.WriteLine("Press enter to exit");
            Console.ReadLine();
        }
    
        private static List<double> getList()
        {
            List<double> result = new List<double>();
    
            Random rand = new Random();
    
            for (int i = 0; i < 50000000; i++)
            {
                result.Add( rand.NextDouble() / 100000.0 + 1);
            }
    
            return result;
        }
    }
    

    My computer output describes that both geometric means are the same, but that:

    Multiply  way took: 466 milliseconds
    Logarithm way took: 3245 milliseconds
    

    So, the multiply appears to be faster.

    But multiply is very problematic with overflow and underflow, so I would recommend the Log approach, unless you can guarantee the product won't overflow and that the product won't get too close to zero.