I want to find geometric average of data and performance does matters.
Which one should I pick between
Keep multiplication over single variable and take Nth-root at the end of calculation
X = MUL(x[i])^(1/N)
Thus, O(N) x Multiplication Complexity + O(1) x Nth-root
Use logarithm
X = e ^ { 1/N * SUM(log(x[i])) }
Thus, O(N) x Logarithm Complexity + O(1) x Nth-division + O(1) Exponential
Specialized algorithm for geometric average. Please tell me if there is.
I thought I would try to benchmark this and get a comparison, here is my attempt.
Comparing was difficult since the list of numbers needed to be large enough to make timing it reasonable, so N is large. In my test N = 50,000,000 elements.
However, multiplying lots of numbers together which are greater than 1 overflows the double storing the product. But multiplying together numbers less than 1 gives a total product which is very small, and dividing by the number of elements gives zero.
Just a couple more things: Make sure none of your elements are zero, and the Log approach doesn't work for negative elements.
(The multiply would work without overflow if C# had a BigDecimal class with an Nth root function.)
Anyway, in my code each element is between 1 and 1.00001
On the other hand, the log approach had no problems with overflows, or underflows.
Here's the code:
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Starting...");
Console.WriteLine("");
Stopwatch watch1 = new Stopwatch();
Stopwatch watch2 = new Stopwatch();
List<double> list = getList();
double prod = 1;
double mean1 = -1;
for (int c = 0; c < 2; c++)
{
watch1.Reset();
watch1.Start();
prod = 1;
foreach (double d in list)
{
prod *= d;
}
mean1 = Math.Pow(prod, 1.0 / (double)list.Count);
watch1.Stop();
}
double mean2 = -1;
for (int c = 0; c < 2; c++)
{
watch2.Reset();
watch2.Start();
double sum = 0;
foreach (double d in list)
{
double logged = Math.Log(d, 2);
sum += logged;
}
sum *= 1.0 / (double)list.Count;
mean2 = Math.Pow(2.0, sum);
watch2.Stop();
}
Console.WriteLine("First way gave: " + mean1);
Console.WriteLine("Other way gave: " + mean2);
Console.WriteLine("First way took: " + watch1.ElapsedMilliseconds + " milliseconds.");
Console.WriteLine("Other way took: " + watch2.ElapsedMilliseconds + " milliseconds.");
Console.WriteLine("");
Console.WriteLine("Press enter to exit");
Console.ReadLine();
}
private static List<double> getList()
{
List<double> result = new List<double>();
Random rand = new Random();
for (int i = 0; i < 50000000; i++)
{
result.Add( rand.NextDouble() / 100000.0 + 1);
}
return result;
}
}
My computer output describes that both geometric means are the same, but that:
Multiply way took: 466 milliseconds
Logarithm way took: 3245 milliseconds
So, the multiply appears to be faster.
But multiply is very problematic with overflow and underflow, so I would recommend the Log approach, unless you can guarantee the product won't overflow and that the product won't get too close to zero.