math mathematical-optimization minimization newtons-method

Gauss Newton minimization for multi-input multi-output function

Given the input-output function in the matrix form:

|y1|   =  |p1  p2|   |x1|     |p5| 
|y2|    = |p3  p4| * |x2|  +  |p6|

with p1−p6 parameters.I want to minimize the least square error using Gauss-Newton method. Suppose we have 100 measurements. My question is about calculation and size of residual vector.

|r1|   |y1|  ( |p1  p2|   |x1|   |p5| )
|r2| = |y2| -( |p3  p4| ∗ |x2| + |p6| )

r_i= output − f(input,parameters)

err = Sum(r_i²)

In order to calculate parameters with minimized-error, we have:

p_i+1=p_i + Δ

Δ= (J^T*J)^-1 * J^T * r_i

I think the size of each is as follows:

input vector(x) : 100x2

output vector(y): 100x2

resuidual (r) : 100x2

Jacobian (J) : 100x6

parameters(pi) : 6x1 (six parameters)

As you can see, the size of Δ would be 6x2 that seems not consistent with p_i+1 Now is my residual vector calculation process correct? if yes how can I compute parameter's vector? And if not what is the correct answer?

Another thing is the way of calculation J (Jacobian) with respect to parameters matrix.

Thank you so much.

Solution

The Delta vector is the same size as the parameter vector.

In fact in this case one can solve for the parameters without doing a full least squares. Unfortunately I've not been able to find a reference on the web, so I've attempted to explain the method:

We want to find a matrix M and a vector t so that, given N points Y[] and and N points X[], the error S is as small as possible, where

S = Sum{ i | (Y[i]-M*X[i]-t)'*(Y[i]-M*X[i]-t)}

(' is transpose).

The drill is:

a/ compute the average Y^ of the Y[] and subtract it from each Y[i], getting y[i]

b/ compute the average X^ of the X[] and subtract it from each X[i], getting x[i]

c/ compute the matrices

A = Sum{ i | y[i]*x[i]'}

and

C = Sum{ i | x[i]*x[i]'}

d/ if C is not invertible (which means that all the X[] lie on a line) you're a bit stuck, or at least this method fails; otherwise compute

M = A*inverse(C)

e/ finally, compute

t = Y^-M*X^

Why does this work?

First off, if we think of M as fixed, then it's easy to see that the t that minimises S is just the average of the Y[i]-M*X[i]; so we may as well use this t when finding M. Substituting t into the formula for S we get

S = Sum{ i | (y[i]-M*x[i])'*(y[i]-M*x[i])}

Now let Tr() denote the function that maps a matrix to its trace (the sum of the diagonal elements). A trick that is quite often useful is that for a vector v we have

v'*v = Tr( v'*v) = Tr( v*v')

Applying this to S, and expanding the product, we get

S = Sum{ i | y[i]'*y[i]} - Tr( A*M') - Tr( M*A') + Tr( M*C*M')

If C is invertible we can complete the square and get (here D = inverse(C))

S = Sum{ i | y[i]'*y[i]} - Tr( A*D*A') + Tr( (M-A*D)*C*(M-A*D)'}

The first two terms don't depend on M; the third is never negative and can be made zero by choosing M=A*D.