I'm implementing some 32-bit float trigonometry in C# using Mono, hopefully utilizing Mono.Simd. I'm only missing solid range reduction currently. I'm rather stuck now, because apparently Mono's SIMD extensions does not include conversions between floats and integers, meaning I have no access to rounding/truncation which would be the usual method. I can however convert bitwise between ints and floats.
Can something like this be done? I can scale the domain up and down if needed, but ideally the range reduction should result in a domain of [0, 2 pi] or [-pi, pi]. I have a hunch that it would be possible to do some IEEE magic with the exponent, if the domain is a power of 2, but I'm really not sure how to.
Edit: Okay, I've tried messing around with this C code and it feels like I'm on the verge of something (it doesn't work but the fractional part is always correct, in decimal / base10 at least...). The core principle seems to be getting the exponent difference between your domain and the input exponent, and composing a new float with a shifted mantissa and an adjusted exponent.. But it won't work for negatives, and I have no idea how to handle non-powers of 2 (or anything fractional - in fact, anything else than 2 doesn't work!).
// here's another more correct attempt:
float fmodulus(float val, int domain)
{
const int mantissaMask = 0x7FFFFF;
const int exponentMask = 0x7F800000;
int ival = *(int*)&val;
int mantissa = ival & mantissaMask;
int rawExponent = ival & exponentMask;
int exponent = (rawExponent >> 23) - (129 - domain);
// powers over one:
int p = exponent;
mantissa <<= p;
rawExponent = exponent >> p;
rawExponent += 127;
rawExponent <<= 23;
int newVal = rawExponent & exponentMask;
newVal |= mantissa & mantissaMask;
float ret = *(float*)&newVal;
return ret;
}
float range_reduce(float value, int range )
{
const int mantissaMask = 0x7FFFFF;
const int exponentMask = 0x7F800000;
int ival = *(int*)&value;
// grab exponent:
unsigned exponent = (ival & exponentMask) >> 23;
// grab mantissa:
unsigned mantissa = ival & mantissaMask;
// remove bias, and see how much the exponent is over range/domain
unsigned char erange = (unsigned char)(exponent - (125 + range));
// check if sign bit is set - that is, the exponent is under our range
if (erange & 0x80)
{
// don't do anything then.
erange = 0;
}
// shift mantissa (and chop off bits) by the reduced amount
int inewVal = (mantissa << (erange)) & mantissaMask;
// add exponent, and subtract the amount we reduced the argument with
inewVal |= ((exponent - erange) << 23) & exponentMask;
// reinterpret
float newValue = *(float*)&inewVal;
return newValue;
//return newValue - ((erange) & 0x1 ? 1.0f : 0.0f);
}
int main()
{
float val = 2.687f;
int ival = *(int*)&val;
float correct = fmod(val, 2);
float own = range_reduce(val, 2);
getc(stdin);
}
Edit 2:
Okay, I'm really trying to understand this in terms of the ieee binary system. If we write the modulus operation like this:
output = input % 2
[exponent] + [mantissa_bit_n_times_exponent]
3.5 = [2] + [1 + 0.5] ->[1] + [0.5] = 1.5
4.5 = [4] + [0 + 0 + 0.5] ->[0.5] + [0] = 0.5
5.5 = [4] + [0 + 1 + 0.5] ->[1] + [0.5] = 1.5
2.5 = [2] + [0 + 0.5] ->[0.5] + [0] = 0.5
2.25 = [2] + [0 + 0 + 0.25] ->[0.25] = 0.25
2.375 = [2] + [0 + 0 + 0.25 + 0.125] ->[0.25] + [0.125] = 0.375
13.5 = [8] + [4 + 0 + 1 + 0.5] ->[1] + [0.5] = 1.5
56.5 = [32] + [16 + 8 + 0 + 0 + 0 + 0.5] ->[0.5] = 0.5
We can see the output in all cases is a new number, with no original exponent and the mantissa shifted an amount (that is based on the exponent and the first non-zero bits of the mantissa after the first exponent-bits of the mantissa is ignored) into the exponent. But I'm not really sure if this is the correct approach, it just works out nicely on paper.
Edit3: I'm stuck on Mono version 2.0.50727.1433
You can reduce the problem to taking a float mod 1. To simplify that, you can compute the floor of the float using bit operations, then use a floating point subtraction. The following is (unsafe) C# code for these operations:
// domain is assumed to be positive
// returns value in [0,domain)
public float fmodulus(float val, float domain)
{
if (val < 0)
{
float negative = fmodulus(-val, domain);
if (domain - negative == domain)
return 0;
else
return domain-negative;
}
if (val < domain)
return val; // this avoids losing accuracy
return fmodOne(val / domain) * domain;
}
// assumes val >= 1, so val is positive and the exponent is at least 0
unsafe public float fmodOne(float val)
{
int iVal = *(int*)&val;
int uncenteredExponent = iVal >> 23;
int exponent = uncenteredExponent - 127; // 127 corresponds to 2^0 times the mantissa
if (exponent >= 23)
return 0; // not enough precision to distinguish val from an integer
int unneededBits = 23 - exponent; // between 0 and 23
int iFloorVal = (iVal >> unneededBits) << unneededBits; // equivalent to using a mask to zero the bottom bits of the mantissa
float floorVal = *(float*)&iFloorVal; // convert the bit pattern back to a float
return val-floorVal;
}
For example, fmodulus(100.1f, 1) is 0.09999847. The bit pattern of 100.1f is
0 10000101 10010000011001100110011
The bit pattern of floorVal (100f) is
0 10000101 10010000000000000000000
A floating point subtraction gives something close to 0.1f:
0 01111011 10011001100110000000000
Actually, I was surprised that the last 8 bits were zeroed out. I thought only the last 6 bits of 0.1f were supposed to be replaced with 0. Perhaps one can do better than relying on the floating point subtraction.