Search code examples
cperformancemathoptimizationtrigonometry

C: Improving performance of function with heavy sin() usage


I have a C function that computes the values of 4 sines based on time elapsed. Using gprof, I figured that this function uses 100% (100.7% to be exact lol) of the CPU time.

void
update_sines(void)
{
    clock_gettime(CLOCK_MONOTONIC, &spec);
    s = spec.tv_sec;
    ms = spec.tv_nsec * 0.0000001;
    etime = concatenate((long)s, ms);

    int k;
    for (k = 0; k < 799; ++k)
    {
        double A1 = 145 * sin((RAND1 * k + etime) * 0.00333) + RAND5;           // Amplitude
        double A2 = 100 * sin((RAND2 * k + etime) * 0.00333) + RAND4;           // Amplitude
        double A3 = 168 * sin((RAND3 * k + etime) * 0.00333) + RAND3;           // Amplitude
        double A4 = 136 * sin((RAND4 * k + etime) * 0.00333) + RAND2;           // Amplitude

        double B1 = 3 + RAND1 + (sin((RAND5 * k) * etime) * 0.00216);           // Period
        double B2 = 3 + RAND2 + (sin((RAND4 * k) * etime) * 0.002);         // Period
        double B3 = 3 + RAND3 + (sin((RAND3 * k) * etime) * 0.00245);           // Period
        double B4 = 3 + RAND4 + (sin((RAND2 * k) * etime) * 0.002);         // Period

        double x = k;                                   // Current x

        double C1 = 0.6 * etime;                            // X axis move
        double C2 = 0.9 * etime;                            // X axis move
        double C3 = 1.2 * etime;                            // X axis move
        double C4 = 0.8 * etime + 200;                          // X axis move

        double D1 = RAND1 + sin(RAND1 * x * 0.00166) * 4;               // Y axis move
        double D2 = RAND2 + sin(RAND2 * x * 0.002) * 4;                 // Y axis move
        double D3 = RAND3 + cos(RAND3 * x * 0.0025) * 4;                // Y axis move
        double D4 = RAND4 + sin(RAND4 * x * 0.002) * 4;                 // Y axis move

        sine1[k] = A1 * sin((B1 * x + C1) * 0.0025) + D1;
        sine2[k] = A2 * sin((B2 * x + C2) * 0.00333) + D2 + 100;
        sine3[k] = A3 * cos((B3 * x + C3) * 0.002) + D3 + 50;
        sine4[k] = A4 * sin((B4 * x + C4) * 0.00333) + D4 + 100;
    }

}

And this is the output from gprof:

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
100.07      0.04     0.04  

I'm currently getting a frame rate of roughly 30-31 fps using this. Now I figure there as to be a more efficient way to do this.

As you noticed I already changed all the divisions to multiplications but that had very little effect on performance.

How could I increase the performance of this math heavy function?


Solution

  • Besides all the other advice given in other answers, here is a pure algorithmic optimization.

    In most cases, you're computing something of the form sin(k * a + b), where a and b are constants, and k is a loop variable. If you were also to compute cos(k * a + b), then you could use a 2D rotation matrix to form a recurrence relationship (in matrix form):

    ⎡cos(k*a + b)⎤ = ⎡cos(a)  -sin(a)⎤ * ⎡cos((k-1)*a + b)⎤
    ⎣sin(k*a + b)⎦   ⎣sin(a)   cos(a)⎦   ⎣sin((k-1)*a + b)⎦
    

    In other words, you can calculate the value for the current iteration in terms of the value from the previous iteration. Thus, you only need to to do the full trig calculation for k == 0, but the rest can be calculated via this recurrence (once you have calculated cos(a) and sin(a), which are constants). So you eliminate 75% of the trig function calls (it's not clear the same trick can be pulled for the final set of trig calls).