Search code examples
c++mathlinear-algebranumericalintel-mkl

Compute numerical error


As seen in this question, there is a difference between the results MKL gives, between serial and distributed execution. For that reason, I would like to study that error. From my book I have:

|ε_(x_c)| = |x - x_c| <= 1/2 * 10^(-d), where d specifies the decimal digits that are accurate, between the actual number, x and the number the computer has, x_c.

|ρ_(x_c)| = |x - x_c|/|x| <= 5 * 10^(-s) is the absolute relative error, where s specifies the number of significant digits.

So, we can write code like this:

double calc_error(double a,double x)
{
  return std::abs(x-a)/std::abs(a);
}

in order to compute the absolute error for example, as seen here.

Are there more types of errors to study, except from the absolute error and the absolute relative error?

Here are some of my data to play with:

serial gives:
-250207683.634793 -1353198687.861288 2816966067.598196 -144344843844.616425 323890119928.788757
distributed gives:
-250207683.634692 -1353198687.861386 2816966067.598891 -144344843844.617096 323890119928.788757

and then I can expand the idea(s) to the actual data and results.


Solution

  • It doesn't get much more complicated than absolute and absolute relative errors. There is another method that compares integer-representations of floating-point formats, the idea being that you want your "tolerance" to adapt with the magnitude of the numbers you are comparing (specifically because there aren't "as many" numbers representable depending on the magnitude).

    All in all, I think your question is very similar to floating-point comparison, for which there is this excellent guide, and this more exhaustive but much longer paper.

    It might also be worth throwing in these for comparing floating point values:

    #include <limits>
    #include <cmath>
    
    template <class T>
    struct fp_equal_strict
    {
        inline bool operator() ( const T& a, const T& b )
        {
            return std::abs(a - b) 
                <= std::max(
                    std::numeric_limits<T>::min() * std::min( std::abs(a), std::abs(b) ),
                    std::numeric_limits<T>::epsilon()
                );
        }
    };
    
    template <class T>
    struct fp_equal_loose
    {
        inline bool operator() ( const T& a, const T& b )
        {
            return std::abs(a - b) 
                <= std::max(
                    std::numeric_limits<T>::min() * std::max( std::abs(a), std::abs(b) ),
                    std::numeric_limits<T>::epsilon()
                );
        }
    };
    
    template <class T>
    struct fp_greater
    {
        inline bool operator() ( const T& a, const T& b )
        {
            return (a - b) >= std::numeric_limits<T>::min() * std::max( std::abs(a), std::abs(b) );
        }
    };
    
    template <class T>
    struct fp_lesser
    {
        inline bool operator() ( const T& a, const T& b )
        {
            return (b - a) >= std::numeric_limits<T>::min() * std::max( std::abs(a), std::abs(b) );
        }
    };