Search code examples
mathmathematical-optimizationminimizationnewtons-method

Gauss Newton minimization for multi-input multi-output function


Given the input-output function in the matrix form:

|y1|   =  |p1  p2|   |x1|     |p5| 
|y2|    = |p3  p4| * |x2|  +  |p6|

with p1−p6 parameters.I want to minimize the least square error using Gauss-Newton method. Suppose we have 100 measurements. My question is about calculation and size of residual vector.

|r1|   |y1|  ( |p1  p2|   |x1|   |p5| )
|r2| = |y2| -( |p3  p4| ∗ |x2| + |p6| )

ri= output − f(input,parameters)

err = Sum(ri2)

In order to calculate parameters with minimized-error, we have:

pi+1=pi + Δ

Δ= (JT*J)-1 * JT * ri

I think the size of each is as follows:

input vector(x) : 100x2

output vector(y): 100x2

resuidual (r) : 100x2

Jacobian (J) : 100x6

parameters(pi) : 6x1 (six parameters)

As you can see, the size of Δ would be 6x2 that seems not consistent with p_i+1 Now is my residual vector calculation process correct? if yes how can I compute parameter's vector? And if not what is the correct answer?

Another thing is the way of calculation J (Jacobian) with respect to parameters matrix.

Thank you so much.


Solution

  • The Delta vector is the same size as the parameter vector.

    In fact in this case one can solve for the parameters without doing a full least squares. Unfortunately I've not been able to find a reference on the web, so I've attempted to explain the method:

    We want to find a matrix M and a vector t so that, given N points Y[] and and N points X[], the error S is as small as possible, where

    S = Sum{ i | (Y[i]-M*X[i]-t)'*(Y[i]-M*X[i]-t)}
    

    (' is transpose).

    The drill is:

    a/ compute the average Y^ of the Y[] and subtract it from each Y[i], getting y[i]

    b/ compute the average X^ of the X[] and subtract it from each X[i], getting x[i]

    c/ compute the matrices

    A = Sum{ i | y[i]*x[i]'} 
    

    and

    C = Sum{ i | x[i]*x[i]'}
    

    d/ if C is not invertible (which means that all the X[] lie on a line) you're a bit stuck, or at least this method fails; otherwise compute

    M = A*inverse(C)
    

    e/ finally, compute

    t = Y^-M*X^
    

    Why does this work?

    First off, if we think of M as fixed, then it's easy to see that the t that minimises S is just the average of the Y[i]-M*X[i]; so we may as well use this t when finding M. Substituting t into the formula for S we get

    S = Sum{ i | (y[i]-M*x[i])'*(y[i]-M*x[i])}
    

    Now let Tr() denote the function that maps a matrix to its trace (the sum of the diagonal elements). A trick that is quite often useful is that for a vector v we have

    v'*v = Tr( v'*v) = Tr( v*v')
    

    Applying this to S, and expanding the product, we get

    S = Sum{ i | y[i]'*y[i]} - Tr( A*M') - Tr( M*A') + Tr( M*C*M')
    

    If C is invertible we can complete the square and get (here D = inverse(C))

    S = Sum{ i | y[i]'*y[i]} - Tr( A*D*A') + Tr( (M-A*D)*C*(M-A*D)'}
    

    The first two terms don't depend on M; the third is never negative and can be made zero by choosing M=A*D.