Search code examples
c++c++11sqrtmath.sqrt

Fastest way to get square root in float value


I am trying to find a fastest way to make square root of any float number in C++. I am using this type of function in a huge particles movement calculation like calculation distance between two particle, we need a square root etc. So If any suggestion it will be very helpful. I have tried and below is my code

#include <math.h>
#include <iostream>
#include <chrono>

using namespace std;
using namespace std::chrono;

#define CHECK_RANGE 100

inline float msqrt(float a)
{
    int i;
    for (i = 0;i * i <= a;i++);
    
    float lb = i - 1; //lower bound
    if (lb * lb == a)
        return lb;
    float ub = lb + 1; // upper bound
    float pub = ub; // previous upper bound
    for (int j = 0;j <= 20;j++)
    {
        float ub2 = ub * ub;
        if (ub2 > a)
        {
            pub = ub;
            ub = (lb + ub) / 2; // mid value of lower and upper bound
        }
        else
        {
            lb = ub; 
            ub = pub;
        }
    }
    return ub;
}

void check_msqrt()
{
    for (size_t i = 0; i < CHECK_RANGE; i++)
    {
        msqrt(i);
    }
}

void check_sqrt()
{
    for (size_t i = 0; i < CHECK_RANGE; i++)
    {
        sqrt(i);
    }
}

int main()
{
    auto start1 = high_resolution_clock::now();
    check_msqrt();
    auto stop1 = high_resolution_clock::now();

    auto duration1 = duration_cast<microseconds>(stop1 - start1);
    cout << "Time for check_msqrt = " << duration1.count() << " micro secs\n";


    auto start2 = high_resolution_clock::now();
    check_sqrt();
    auto stop2 = high_resolution_clock::now();

    auto duration2 = duration_cast<microseconds>(stop2 - start2);
    cout << "Time for check_sqrt = " << duration2.count() << " micro secs";
    
    //cout << msqrt(3);

    return 0;
}

output of above code showing the implemented method 4 times more slow than sqrt of math.h file. I need faster than math.h version. enter image description here


Solution

  • In short, I do not think it is possible to implement something generally faster than the standard library version of sqrt.

    Performance is a very important parameter when implementing standard library functions and it is fair to assume that such a commonly used function as sqrt is optimized as much as possible.

    Beating the standard library function would require a special case, such as:

    • Availability of a suitable assembler instruction - or other specialized hardware support - on the particular system for which the standard library has not been specialized.
    • Knowledge of the needed range or precision. The standard library function must handle all cases specified by the standard. If the application only needs a subset of that or maybe only requires an approximate result then perhaps an optimization is possible.
    • Making a mathematical reduction of the calculations or combine some calculation steps in a smart way so an efficient implementation can be made for that combination.