How to achieve maximal accuracy in the sum of series calculation?

I wrote a code to calculate the sum of the series 2^(-k), but I don't know how to improve the accuracy of this calculation. This is what I've done so far.

#include <iostream>
#include <math.h>
using namespace std;

int main()
{
    int i, n;
    float sum = 0;

    cout << "Enter the value of n: ";
    cin >> n;

    for (i=1; i<=n; i++)
        sum += 1.0/pow(2,i);

    cout << "Sum: " << sum;


    return 0;
}

Any suggestion and/or help is very appreciated.

Solution

To see more precise output, you need to request more precision than the C++ default. One way to do this is:

#include <iomanip>
…
   std::cout << std::setprecision(99);

Next, consider this code:

for (i=1; i<=n; i++)
    sum += 1.0/pow(2,i);

First, recognize that the quality of pow implementations varies. The C and C++ standards are lax about the quality of floating-point operations, and some pow implementations return results for simple cases like pow(10, 3) that differ slightly from the mathematical result. Due to the ways pow is frequently implemented, pow(2, i) might not suffer from this problem, but it should be considered.

Let’s assume pow(2, i) computes the proper result exactly. Let’s also assume that your C++ implementation uses the common IEEE-754 basic 32-bit binary floating-point format for float. If so, there is no error in the sum computed above for n ≤ 24.

This is because each term, 1.0/pow(2, i), is representable as a single bit in the significand (fraction part) of a float, and the float has 24-bit significands, so 24 consecutive bits can be represented with no error. Once you increase the precision used to format output, the sums shown for n ≤ 24 should be exact.

When n = 25, the sum no longer fits in a float. At this point, the mathematical result will be rounded to the nearest value representable in a float, generally using the rule that, if there is a tie between the two nearest representable values, the one with the even low bit will be chosen. This means the result will be 1, exactly. For all n > 24, the result will be 1.

While using the float type, it is not possible to increase the accuracy beyond this. That is because, of all the values that can be represented in the float type, 1 is the value closest to the exact mathematical sum of the series. There simply is no closer representable value, so no computation or alteration of source code can produce any more accurate value.

You can produce more accurate values by using double instead of float. If the IEEE-754 basic 64-bit binary format is used for double, then this will produce exact results for n ≤ 53. For n > 53, the result will again be 1, and the sum could be improved only by using extended-precision arithmetic.

Additionally, note that:

float sum = 0;
for (i=1; i<=n; i++)
    sum += 1.0/pow(2,i);

is mathematically equivalent to:

float sum = 1 - pow(2.f, (float) -n);