Search code examples
calgorithmbenchmarkingnumerical-methods

Performance of n-section root finding algorithm


I wrote a n-section algorithm for finding roots of a function. The working principle is exactly the same as is bisection method, only the range is divided into N equal parts instead. This is my C code:

/*
    Compile with: clang -O3 -o nsect nsect.c -Wall -DCOUNT=5000000 -DNSECT=? -funroll-loops
*/

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>

#define N 6

#ifndef COUNT
#error COUNT not defined!
#endif

#ifndef NSECT
#define NSECT 2
#endif

float polynomial[N];
float horner( const float poly[N], float x )
{
    float val = poly[N-1];
    for ( int i = N - 2; i >= 0; i-- )
        val = poly[i] + x * val;
    return val;
}

float f( float x )
{
    return horner( polynomial, x );
}

float nsect( float a, float b, const float eps_x )
{
    float fa = f( a );
    float fb = f( b );
    if ( fa == 0 ) return a;
    else if ( fb == 0 ) return b;
    else if ( fa * fb > 0 ) return 0;

    float x[NSECT];
    float fx[NSECT];

    while ( b - a > eps_x )
    {   
        x[0] = a;
        if ( ( fx[0] = f( x[0] ) ) == 0 ) return x[0];

        int found = 0;
        for ( int i = 0; i < NSECT - 1; i++ )
        {
            x[i + 1] = a + ( b - a ) * (float)( i + 1 ) / NSECT;
            if ( ( fx[i + 1] = f( x[i + 1] ) ) == 0 )
                return x[i + 1];
            else if ( fx[i] * fx[i + 1] < 0 )
            {
                a = x[i];
                b = x[i + 1];
                found = 1;
                break;
            }
        }

        if ( !found )
            a = x[NSECT - 1]; 

    }

    return ( a + b ) * 0.5f;
}

int main( int argc, char **argv )
{
    struct timeval t0, t1;
    float *polys = malloc( COUNT * sizeof( float ) * N );
    float *p = polys;
    for ( int i = 0; i < COUNT * N; i++ )
        scanf( "%f", p++ ); 

    float xsum = 0; // So the code isn't optimized when we don't print the roots
    p = polys;
    gettimeofday( &t0, NULL );
    for ( int i = 0; i < COUNT; i++, p += N )
    {
        memcpy( polynomial, p, N * sizeof( float ) );
        float x = nsect( -100, 100, 1e-3 );
        xsum += x;

        #ifdef PRINT_ROOTS
            fprintf( stderr, "%f\n", x );
        #endif
    }
    gettimeofday( &t1, NULL );

    fprintf( stderr, "xsum: %f\n", xsum );
    printf( "%f ms\n", ( ( t1.tv_sec - t0.tv_sec ) * 1e6 + ( t1.tv_usec - t0.tv_usec ) ) * 1e-3 );
    free( polys );
}

EDIT: This is the command I used to compile the code: clang -O3 -o nsect nsect.c -Wall -DCOUNT=5000000 -DNSECT=? -funroll-loops. I ran everything on i7-8700k.

I decided to test the algorithm performance for different N values. The test consists of measuring time needed to find any root in range (-100;100) for each of 5,000,000 polynomials of degree 5. The polynomials are randomly generated and have real coefficients ranging from -5 to 5. The polynomial values are calculated using Horner's method.

These are results I got from running the code 10 times for each N (x=N, y=time [ms]):

My understanding of the worst case performance here is that the amount of work to be done in the main while loop is proportional to N. The main loop needs logN(C) (where C > 1 is a constant - ratio of initial search range to requested accuracy) iterations to complete. This yields following equation:

Equation

The plot looks very similar to the violet curve I used above to roughly match the data:

Now, I have some (hopefully correct) conclusions and questions:

  • Firstly, is my approach valid at all?
  • Does the function I came up with really describe the relation between number of operations and N?
  • I think this is the most interesting one - I'm wondering what causes such significant difference between N=2 and all the others? I consistently get this pattern for all my test data.

Moreover:

  • That function has minimum in e≈2.718..., which is closer to 3 than to 2. Also, f(3) < f(2) holds. Given that the equation I came up with is correct, I think that would imply that trisection method may actually be more efficient than bisection. It seems a bit counter intuitive, but the measurements seem to acknowledge that. Is this right?
  • If so, why does trisection method seem to be so unpopular compared to bisection?

Thank you


Solution

  • I am commenting on the general question, not on your particular code, which I don't fully understand.

    Assuming that there is a single root known to be in an interval of length L and the desired accuracy is ε, you will need log(L/ε)/log(N) subdivision stages. Every subdivision stage requires N-1 evaluations of the function (not N), to tell which subinterval among N contains the root.

    Hence, neglecting overhead, the total cost is proportional to (N-1)/log(N). The values of this ratio are, starting from N=2:

    1.44, 1.82, 2.16, 2.49, 2.79...
    

    and higher.

    Hence the theoretical optimum is with N=2. This is why trisection is not used.