Search code examples
pythonnumpymathnewtons-method

Newton raphson fails to approximate beyond two decimal points


I've written a program which finds roots of a two function system using the Newton Raphson method. Its goal is to solve for two unknowns of a closed vector loop (sketches below). The issue is that it only works when the error tolerance is 0.01 and greater, in other cases (for example, when it's 0.001) it falls into an oscillation and seemingly never converges. I find this behaviour odd, since the fact that it succeeds in getting to 0.01 precision should prove its ability to land a guess inside the convergence radius, and from there on further approximation shouldn't face any problems...

import numpy as np
import random as rnd

TOLERANCE = 0.01
ITER_LIMIT = 50

track_x = 0
track_y = 0
track_incline = np.deg2rad(60)

r41 = 12.2/2
r1 = 28.88 - 10.95
r2 = 22.65 - track_x
r26 = 8/2 - track_y
r3 = 8
r4 = 12.2

theta41 = np.pi
theta3 = np.deg2rad(30)

def f1(x, y):
    return r41*np.sin(theta41) + r1*np.cos(theta41) + r2*np.sin(y) + r26*np.cos(y) - x*np.sin(y + track_incline) + r3*np.sin(theta3)

def f2(x, y):
    return r41*np.cos(theta41) - r1*np.sin(theta41) + r2*np.cos(y) - r26*np.sin(y) - x*np.cos(y + track_incline) + r3*np.cos(theta3) + r4

def jacobian_mat(x, y):
    return np.array([
        [-np.sin(y + track_incline), r2*np.cos(y) - r26*np.sin(y) - x*np.cos(y + track_incline)],
        [-np.cos(y + track_incline), -r2*np.sin(y) - r26*np.cos(y) + x*np.sin(y + track_incline)]
    ])

def compute_newton_raphson(x0, y0):
    for i in range(ITER_LIMIT):
        result = -np.matmul(np.array([f1(x0, y0), f2(x0, y0)]), np.linalg.inv(jacobian_mat(x0, y0))) + np.array([x0, y0])
        x = result[0]
        y = result[1]

        print(x, y)

        if max(np.abs(x - x0), np.abs(y - y0)) <= TOLERANCE:
            return (x, y)
        else:
            x0 = x
            y0 = y

    return None

while True:
    x0 = rnd.randrange(-100, 100)
    y0 = rnd.randrange(-100, 100)

    res = compute_newton_raphson(x0, y0)
    if res != None:
        print(f"RESULT:\nX = {res[0]}mm\nY = {np.rad2deg(res[1])}deg")
        break

Here's a sketch of the problem that I'm modeling this program for, just in case it proves useful.

Angles

I had zero success playing around with the iteration count. Also double checked if f1 and f2 are properly constructed.

Any insight is greatly appreciated!


Solution

  • Your Newton-Raphson formula is wrong.

    Newton-Raphson says:

    Xₙ₊₁ = Xₙ - J⁻¹F(Xₙ)

    You are computing

    Xₙ₊₁ = Xₙ - F(Xₙ)ᵀJ⁻¹

    That is, equivalent of
    Xₙ₊₁ = Xₙ - (J⁻¹)ᵀF(Xₙ)

    Matrix multiplication is not commutative.

    And matmul, or dot or @ have different interpretation of 1D-array, depending on where they are in the operation

    See

    M=np.array([[1,2],[3,4]])
    X=np.array([5,6])
    Col=np.array([[5],[6]]) # Same as X, as a column (a 2D array of 1 column)
    

    Then

    M@X # or np.dot(M,X) or np.matmult(M,X)
    #[17,39]
    # That is same as M@Col, but in 1D. So same as (M@Col).flatten()
    
    X@M # or np.dot(X,M) or np.matmul(X,M)
    #[23,34]
    # That is same as Col.T@M, but in 1D
    # Note that Col@M wouldn't work (you can't multiply a column by a matrix)
    

    I insist on this column-matrix vs 1d-vector, because it explains why it worked in the first place. In maths, in matrix operation, we usually represent vectors as columns. So MX makes sense. XM doesn't even make sense: you can't multiply a 2×1 matrix by a 2×2 matrix.

    The closest thing to multiplying (on the left) a vector by a matrix would be XᵀM, that is X as a row, (Col.T in previous code) by a matrix.

    And XᵀM is the same as (MᵀX)ᵀ.

    So what you are computing is Xₙ-((J⁻¹)ᵀF(Xₙ))ᵀ
    which shouldn't make sense, since that is a column minus a row. But since X, and ((J⁻¹)ᵀF(Xₙ)) are 1D array anyway, the implicit ᵀ (the last one) is void. But the one on Mᵀ is still there and pretty important.

    So, strictly speaking, in math, what you compute doesn't make sense.

    But numpy operation (matmul, dot, @ — with some variants, but irrelevant in this case) does allow to combine 2D array (matrix) with 1D array. And when you do so, the 1D-array is, interpreted as a column or as a row, depending on what makes sense.

    Reason why your code works (I mean: why it doesn't raise an error), ignoring the last implicit ᵀ. But still, what you compute is what you would get if you transposed the inverse of the jacobian matrix before using Newton-Raphson

    Additional notes

    There are other problems in your code. That are not the source of your problem. But could be, depending on the parameters.

    1. y is obviously an angle. In radians. Why choose it between -100 and 100?

    2. for both x and y, why use integer initial values? For x, you may say that if x normal range is between -100 and 100, then there is probably a integer starting point that is not so far (compared the whole range width) from the solution.

    For for y, once corrected the 1st point, that means that you start only from -3, -2, -1, 0, 1, 2, or 3. If all of those starting points are too far from the solution for Newton-Raphson to converge, then you could try for ever... (fortunately, it is not the case. Plus, the two errors compensates a bit. Since 2π is, famously, not a rational, choosing a integer between -100 and 100 is equivalent to choosing among 200 different values between -π and π. for example 7 is equivalent to 0.7168...)

    But, well, the idea behind your infinite while loop is to retry from another starting point if the iteration do not converge, until you find a starting point from which it does. So it is not a very good idea to start from a very finite set of possible starting points

    1. It is not a good idea to solve J@(Xₙ₊₁-Xₙ) = - F(Xₙ) by inverting the jacobian. It is better to use np.solve (as Ali did in his answer) to do so (but, also, if we start trying not to do too much of numpy's job, then, probably, a better idea would be to use scipy.newton in the first place)

    2. Rather than a short number of iterations, I would watch for absolute value of x0/y0 to detect divergence. If grows too much, you can stop the iteration and retry from a random value. And then, you can have a insanely big value for ITER_LIMIT since it should never be used (either the iteration converge and you are ok. Or it diverges, and you'll detect it. Or, very unlikely, it wander randomly but within a reasonible range, and, after all, that is the same as your random. The insanely big ITER_LIMIT role would be to ensure the iterations are stopped for the offside chance you end up in a stable cycle)