c algorithm benchmarking numerical-methods

Performance of n-section root finding algorithm

I wrote a n-section algorithm for finding roots of a function. The working principle is exactly the same as is bisection method, only the range is divided into N equal parts instead. This is my C code:

/*
    Compile with: clang -O3 -o nsect nsect.c -Wall -DCOUNT=5000000 -DNSECT=? -funroll-loops
*/

#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>

#define N 6

#ifndef COUNT
#error COUNT not defined!
#endif

#ifndef NSECT
#define NSECT 2
#endif

float polynomial[N];
float horner( const float poly[N], float x )
{
    float val = poly[N-1];
    for ( int i = N - 2; i >= 0; i-- )
        val = poly[i] + x * val;
    return val;
}

float f( float x )
{
    return horner( polynomial, x );
}

float nsect( float a, float b, const float eps_x )
{
    float fa = f( a );
    float fb = f( b );
    if ( fa == 0 ) return a;
    else if ( fb == 0 ) return b;
    else if ( fa * fb > 0 ) return 0;

    float x[NSECT];
    float fx[NSECT];

    while ( b - a > eps_x )
    {   
        x[0] = a;
        if ( ( fx[0] = f( x[0] ) ) == 0 ) return x[0];

        int found = 0;
        for ( int i = 0; i < NSECT - 1; i++ )
        {
            x[i + 1] = a + ( b - a ) * (float)( i + 1 ) / NSECT;
            if ( ( fx[i + 1] = f( x[i + 1] ) ) == 0 )
                return x[i + 1];
            else if ( fx[i] * fx[i + 1] < 0 )
            {
                a = x[i];
                b = x[i + 1];
                found = 1;
                break;
            }
        }

        if ( !found )
            a = x[NSECT - 1]; 

    }

    return ( a + b ) * 0.5f;
}

int main( int argc, char **argv )
{
    struct timeval t0, t1;
    float *polys = malloc( COUNT * sizeof( float ) * N );
    float *p = polys;
    for ( int i = 0; i < COUNT * N; i++ )
        scanf( "%f", p++ ); 

    float xsum = 0; // So the code isn't optimized when we don't print the roots
    p = polys;
    gettimeofday( &t0, NULL );
    for ( int i = 0; i < COUNT; i++, p += N )
    {
        memcpy( polynomial, p, N * sizeof( float ) );
        float x = nsect( -100, 100, 1e-3 );
        xsum += x;

        #ifdef PRINT_ROOTS
            fprintf( stderr, "%f\n", x );
        #endif
    }
    gettimeofday( &t1, NULL );

    fprintf( stderr, "xsum: %f\n", xsum );
    printf( "%f ms\n", ( ( t1.tv_sec - t0.tv_sec ) * 1e6 + ( t1.tv_usec - t0.tv_usec ) ) * 1e-3 );
    free( polys );
}

EDIT: This is the command I used to compile the code: clang -O3 -o nsect nsect.c -Wall -DCOUNT=5000000 -DNSECT=? -funroll-loops. I ran everything on i7-8700k.

I decided to test the algorithm performance for different N values. The test consists of measuring time needed to find any root in range (-100;100) for each of 5,000,000 polynomials of degree 5. The polynomials are randomly generated and have real coefficients ranging from -5 to 5. The polynomial values are calculated using Horner's method.

These are results I got from running the code 10 times for each N (x=N, y=time [ms]):

My understanding of the worst case performance here is that the amount of work to be done in the main while loop is proportional to N. The main loop needs log_N(C) (where C > 1 is a constant - ratio of initial search range to requested accuracy) iterations to complete. This yields following equation:

$O(N)=N\log_NC$

The plot looks very similar to the violet curve I used above to roughly match the data:

Now, I have some (hopefully correct) conclusions and questions:

Firstly, is my approach valid at all?
Does the function I came up with really describe the relation between number of operations and N?
I think this is the most interesting one - I'm wondering what causes such significant difference between N=2 and all the others? I consistently get this pattern for all my test data.

Moreover:

That function has minimum in e≈2.718..., which is closer to 3 than to 2. Also, f(3) < f(2) holds. Given that the equation I came up with is correct, I think that would imply that trisection method may actually be more efficient than bisection. It seems a bit counter intuitive, but the measurements seem to acknowledge that. Is this right?
If so, why does trisection method seem to be so unpopular compared to bisection?

Thank you

Solution

I am commenting on the general question, not on your particular code, which I don't fully understand.

Assuming that there is a single root known to be in an interval of length L and the desired accuracy is ε, you will need log(L/ε)/log(N) subdivision stages. Every subdivision stage requires N-1 evaluations of the function (not N), to tell which subinterval among N contains the root.

Hence, neglecting overhead, the total cost is proportional to (N-1)/log(N). The values of this ratio are, starting from N=2:

1.44, 1.82, 2.16, 2.49, 2.79...

and higher.

Hence the theoretical optimum is with N=2. This is why trisection is not used.