I wrote a n-section algorithm for finding roots of a function. The working principle is exactly the same as is bisection method, only the range is divided into N equal parts instead. This is my C code:
/*
Compile with: clang -O3 -o nsect nsect.c -Wall -DCOUNT=5000000 -DNSECT=? -funroll-loops
*/
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#define N 6
#ifndef COUNT
#error COUNT not defined!
#endif
#ifndef NSECT
#define NSECT 2
#endif
float polynomial[N];
float horner( const float poly[N], float x )
{
float val = poly[N-1];
for ( int i = N - 2; i >= 0; i-- )
val = poly[i] + x * val;
return val;
}
float f( float x )
{
return horner( polynomial, x );
}
float nsect( float a, float b, const float eps_x )
{
float fa = f( a );
float fb = f( b );
if ( fa == 0 ) return a;
else if ( fb == 0 ) return b;
else if ( fa * fb > 0 ) return 0;
float x[NSECT];
float fx[NSECT];
while ( b - a > eps_x )
{
x[0] = a;
if ( ( fx[0] = f( x[0] ) ) == 0 ) return x[0];
int found = 0;
for ( int i = 0; i < NSECT - 1; i++ )
{
x[i + 1] = a + ( b - a ) * (float)( i + 1 ) / NSECT;
if ( ( fx[i + 1] = f( x[i + 1] ) ) == 0 )
return x[i + 1];
else if ( fx[i] * fx[i + 1] < 0 )
{
a = x[i];
b = x[i + 1];
found = 1;
break;
}
}
if ( !found )
a = x[NSECT - 1];
}
return ( a + b ) * 0.5f;
}
int main( int argc, char **argv )
{
struct timeval t0, t1;
float *polys = malloc( COUNT * sizeof( float ) * N );
float *p = polys;
for ( int i = 0; i < COUNT * N; i++ )
scanf( "%f", p++ );
float xsum = 0; // So the code isn't optimized when we don't print the roots
p = polys;
gettimeofday( &t0, NULL );
for ( int i = 0; i < COUNT; i++, p += N )
{
memcpy( polynomial, p, N * sizeof( float ) );
float x = nsect( -100, 100, 1e-3 );
xsum += x;
#ifdef PRINT_ROOTS
fprintf( stderr, "%f\n", x );
#endif
}
gettimeofday( &t1, NULL );
fprintf( stderr, "xsum: %f\n", xsum );
printf( "%f ms\n", ( ( t1.tv_sec - t0.tv_sec ) * 1e6 + ( t1.tv_usec - t0.tv_usec ) ) * 1e-3 );
free( polys );
}
EDIT: This is the command I used to compile the code: clang -O3 -o nsect nsect.c -Wall -DCOUNT=5000000 -DNSECT=? -funroll-loops
.
I ran everything on i7-8700k.
I decided to test the algorithm performance for different N values. The test consists of measuring time needed to find any root in range (-100;100) for each of 5,000,000 polynomials of degree 5. The polynomials are randomly generated and have real coefficients ranging from -5 to 5. The polynomial values are calculated using Horner's method.
These are results I got from running the code 10 times for each N (x=N, y=time [ms]):
My understanding of the worst case performance here is that the amount of work to be done in the main while loop is proportional to N. The main loop needs logN(C) (where C > 1 is a constant - ratio of initial search range to requested accuracy) iterations to complete. This yields following equation:
The plot looks very similar to the violet curve I used above to roughly match the data:
Now, I have some (hopefully correct) conclusions and questions:
Moreover:
Thank you
I am commenting on the general question, not on your particular code, which I don't fully understand.
Assuming that there is a single root known to be in an interval of length L and the desired accuracy is ε, you will need log(L/ε)/log(N) subdivision stages. Every subdivision stage requires N-1 evaluations of the function (not N), to tell which subinterval among N contains the root.
Hence, neglecting overhead, the total cost is proportional to (N-1)/log(N). The values of this ratio are, starting from N=2:
1.44, 1.82, 2.16, 2.49, 2.79...
and higher.
Hence the theoretical optimum is with N=2. This is why trisection is not used.