I need to convert a 32 bit IEEE754 float to a signed Q19.12 fixed-point format. The problem is that it must be done in a fully deterministic way, so the usual (int)(f * (1 << FRACTION_SHIFT)) is not suitable, since it uses non-deterministic floating point math. Are there any "bit fiddling" or similar deterministic conversion methods?
Edit: Deterministic in this case is assumed as: given the same floating point data achieve exactly same conversion results on different platforms.
While @StephenCanon's answer might be right about this particular case being fully deterministic, I've decided to stay on the safer side, and still do the conversion manually. This is the code I have ended up with (thanks to @CodesInChaos for pointers on how to do this):
public static Fixed FromFloatSafe(float f) {
// Extract float bits
uint fb = BitConverter.ToUInt32(BitConverter.GetBytes(f), 0);
uint sign = (uint)((int)fb >> 31);
uint exponent = (fb >> 23) & 0xFF;
uint mantissa = (fb & 0x007FFFFF);
// Check for Infinity, SNaN, QNaN
if (exponent == 255) {
throw new ArgumentException();
// Add mantissa's assumed leading 1
} else if (exponent != 0) {
mantissa |= 0x800000;
}
// Mantissa with adjusted sign
int raw = (int)((mantissa ^ sign) - sign);
// Required float's radix point shift to convert to fixed point
int shift = (int)exponent - 127 - FRACTION_SHIFT + 1;
// Do the shifting and check for overflows
if (shift > 30) {
throw new OverflowException();
} else if (shift > 0) {
long ul = (long)raw << shift;
if (ul > int.MaxValue) {
throw new OverflowException();
}
if (ul < int.MinValue) {
throw new OverflowException();
}
raw = (int)ul;
} else {
raw = raw >> -shift;
}
return Fixed.FromRaw(raw);
}