algorithm matlab optimization numerical-methods

Optimizing algorithm calculating (sin(x)-x)*x^{-3} (in matlab)

My task is to write optimal program that calculates matrix Y, given matrix X, where:

y = (sin(x)-x) x^-3

Here's the code I have written so far:

n = size(X, 1);
m = size(X, 2);
Y = zeros(n, m);
d = n*m; 

for i = 1:d
    x = X(i);
    if abs(x)<0.1
        Y(i) = -1/6+x.^2/120-x.^4/5040+x.^6/362880;
    else
        Y(i) = (sin(x)-x).*(x.^(-3));
    end
end

So, generally the formula was inaccurate around 0, so I have approximated it using Taylor theorem.

Unfortunately this program has accuracy of 91% and efficiency of only 24% (so it's 4 times slower than the optimal solution).

The tests are around 13 million samples, out of which around 6 million have value of less than 0.1. The range of samples is (-8π , 8π).

The target accuracy (100%) is 4*epsilon where epsilon equals 2^(-52) (that means that numbers calculated by program shouldn't be larger or smaller than numbers calculated "perfectly" than 4*epsilon).

100*epsilon means accuracy of 86%.

Do you have any ideas on how to make it faster and more accurate? I'm looking both for mathematical tricks on how to further transform given formula, and general MATLAB tips that can accelerate programs?

EDIT: Using Horner method, I have managed to bring up efficiency up to 81% (accuracy still 91%) with this program:

function Y = main(X)

Y = (sin(X)-X).*(X.^(-3));
i = abs(X) < 0.1;
Y(i) = horner(X(i));

function y = horner (x)

pow = x.*x;
y = -1/6+pow.*(1/120+pow.*(-1/5040+pow./362880));

Do you have any further ideas on how to improve it?

Solution

You can replace your loop with vectorized code. This is usually more efficient than loop because the loop has a conditional in it, which is bad for branch prediction:

Y = (sin(X)-X).*(X.^(-3));
i = abs(X) < 0.1;
Y(i) = -1/6+X(i).^2/120-X(i).^4/5040+X(i).^6/362880;

Rewriting the primary equation to avoid the cubic root yields a 3x speedup for that computation:

Y = (sin(X)./X - 1) ./ (X.*X);

Speed comparison:

The following script compares timing for this method compared to OP's loop code. I use data that has 7 million values uniformly distributed in (-8π, 8π), and another 6 million values uniformly distributed in (-0.1,0.1).

OP's loop code takes 2.4412 s, and the vectorized solution takes 0.7224 s. Using OP's Horner method and the rewritten sin expression it takes 0.1437 s.

X = [linspace(-8*pi,8*pi,7e6), linspace(-0.1,0.1,6e6)];
timeit(@()method1(X))
timeit(@()method2(X))

function Y = method1(X)
n = size(X, 1);
m = size(X, 2);
Y = zeros(n, m);
d = n*m; 

for i = 1:d
    x = X(i);
    if abs(x)<0.1
        Y(i) = -1/6+x.^2/120-x.^4/5040+x.^6/362880;
    else
        Y(i) = (sin(x)-x).*(x.^(-3));
    end
end
end

function Y = method2(X)
Y = (sin(X)-X).*(X.^(-3));
i = abs(X) < 0.1;
Y(i) = -1/6+X(i).^2/120-X(i).^4/5040+X(i).^6/362880;
end

function Y = method3(X)
Y = (sin(X)./X - 1) ./ (X.*X);
i = abs(X) < 0.1;
Y(i) = horner(X(i));
end

function y = horner (x)
pow = x.*x;
y = -1/6+pow.*(1/120+pow.*(-1/5040+pow./362880));
end