Search code examples
c++floating-pointdoublefloating-point-precisiondouble-precision

Assigning 2^31 to a float or double


I have to multiply an integer value by 2^31. I have googled it, and it looks like doubles have a range among 2.23e-308 <= |X| <= 1.79e308 when using 64 bits, and a float among 1.18e-38 <= |X| <= 3.40e38.

That's a lot more than what I need. But it doesn't work.

I have this constant value in my header file:

static const float SCALE_FACTOR = 2^-31;

and if then I just do:

float dummy = SCALE_FACTOR;

Then, dummy's value is 11.

I don't know if the problem is assigning a constant value like that, but I don't know how else to write it without losing precision.

Any help?

EDIT:Sorry, stupid question. My MatLab background betrayed me and forgot that ^ is not for exponentiation in C++. I have voted to close.


Solution

  • ^ is a bitwise xor operator in C++, not a mathematical exponentiation operator. You have a few alternatives.

    1. Because you're storing the constant in an intrinsically lossy format, a float, you could just work out the base-10 e form literal and use that: perhaps something like 4.6566128730773926e-010 for 2^-31. This is prone to error (I've made one, for example), and isn't necessarily portable between floating point formats.
    2. You can set the constant using a constant expression that can be evaluated at compile time, and use an integer literal: either 0x80000000 or 1UL << 31 for 2^31, or 1.0f / 0x80000000 for 2^-31 for example.

    You could use a pow function of some kind to calculate the value at runtime, of course.

    Incidentally, if you're using integers, why not just use a long long or other 64bit integral type, instead of a floating point which may well present you with rounding errors? It isn't totally clear from your question whether you are looking at floating point values because you need floating point value ranges, or because you are merely worried about overflowing a 32-bit integer value.