I am working on a gamma function that generates a "S-Curve". I need to run it in a realtime environment so i need to speed it up as much as possible.
The code is as follows:
float Gamma = 2.0f; //Input Variable
float GammaMult = pow(0.5f, 1.0f-Gamma);
if(Input<1.0f && Input>0.0f)
{
if(Input<0.5f)
{
Output = pow(Input,Gamma)*GammaMult;
}
else
{
Output = 1.0f-pow(1.0f-Input,Gamma)*GammaMult;
}
}
else
{
Output = Input;
}
Is there any way I can optimize this code?
You can avoid pipeline stalls by eliminating branching on Input<1.0f && Input>0.0f
if the instruction set supports saturation arithmetic or use max/min intrinsics, e.g. x86 MAXSS
You should also eliminate the other branching via rounding the saturated Input
. Full algorithm:
float GammaMult = pow(0.5f, 1.0f-Gamma);
Input = saturate(Input); // saturate via assembly or intrinsics
// Input is now in [0, 1]
Rounded = round(Input); // round via assembly or intrinsics
Coeff = 1 - 2 * Rounded
Output = Rounded + Coeff * pow(Rounded + Coeff * Input,Gamma)*GammaMult;
Rounding should be done via asm/intrinsics as well.
If you use this function on e.g. successive values of an array you should consider vectorising it if the target architecture supports SIMD.