How can i optimize this S-curve function?

I am working on a gamma function that generates a "S-Curve". I need to run it in a realtime environment so i need to speed it up as much as possible.

The code is as follows:

float Gamma = 2.0f; //Input Variable

float GammaMult = pow(0.5f, 1.0f-Gamma);
if(Input<1.0f && Input>0.0f)
{
    if(Input<0.5f)
    {
        Output = pow(Input,Gamma)*GammaMult;
    }
    else
    {
        Output  = 1.0f-pow(1.0f-Input,Gamma)*GammaMult;
    }
}
else
{
   Output  = Input;
}

Is there any way I can optimize this code?

Solution

You can avoid pipeline stalls by eliminating branching on Input<1.0f && Input>0.0f if the instruction set supports saturation arithmetic or use max/min intrinsics, e.g. x86 MAXSS

You should also eliminate the other branching via rounding the saturated Input. Full algorithm:

float GammaMult = pow(0.5f, 1.0f-Gamma);
Input = saturate(Input); // saturate via assembly or intrinsics
// Input is now in [0, 1]
Rounded = round(Input); // round via assembly or intrinsics
Coeff = 1 - 2 * Rounded
Output = Rounded + Coeff * pow(Rounded + Coeff * Input,Gamma)*GammaMult;

Rounding should be done via asm/intrinsics as well.

If you use this function on e.g. successive values of an array you should consider vectorising it if the target architecture supports SIMD.