Search code examples
mathmatrixoctavecalculus

Differentiating a scalar with respect to matrix


I have a scalar function which is obtained by iterative calculations. I wish to differentiate(find the directional derivative) of the values with respect to a matrix elementwise. How should I employ the finite difference approximation in this case. Does diff or gradient help in this case. Note that I only want numerical derivatives. The typical code that I would work on is:

n=4;
for i=1:n
  for x(i)=-2:0.04:4;
    for y(i)=-2:0.04:4;
      A(:,:,i)=[sin(x(i)), cos(y(i));2sin(x(i)),sin(x(i)+y(i)).^2];
      B(:,:,i)=[sin(x(i)), cos(x(i));3sin(y(i)),cos(x(i))];
      R(:,:,i)=horzcat(A(:,:,i),B(:,:,i));
      L(i)=det(B(:,:,i)'*A(:,:,i)B)(:,:,i));

      %how to find gradient of L with respect to x(i), y(i)
      grad_L=tr((diff(L)/diff(R)')*(gradient(R))
    endfor;
  endfor;
endfor;

I know that the last part for grad_L would syntax error saying the dimensions don't match. How do I proceed to solve this. Note that gradient or directional derivative of a scalar functionf of a matrix variable X is given by nabla(f)=trace((partial f/patial(x_{ij})*X_dot where x_{ij} denotes elements of matrix and X_dot denotes gradient of the matrix X


Solution

  • Both your code and explanation are very confusing. You're using an iteration of n = 4, but you don't do anything with your inputs or outputs, and you overwrite everything. So I will ignore the n aspect for now since you don't seem to be making any use of it. Furthermore you have many syntactical mistakes which look more like maths or pseudocode, rather than any attempt to write valid Matlab / Octave.

    But, essentially, you seem to be asking, "I have a function which for each (x,y) coordinate on a 2D grid, it calculates a scalar output L(x,y)", where the calculation leading to L involves multiplying two matrices and then getting their determinant. Here's how to produce such an array L:

    X = -2 : 0.04 : 4; 
    Y = -2 : 0.04 : 4;
    X_indices = 1 : length(X);
    Y_indices = 1 : length(Y);
    
    for Ind_x = X_indices
      for Ind_y = Y_indices
        x = X(Ind_x);   y = Y(Ind_y);
        A = [sin(x), cos(y); 2 * sin(x), sin(x+y)^2];
        B = [sin(x), cos(x); 3 * sin(y), cos(x)    ];
        L(Ind_x, Ind_y) = det (B.' * A * B);
      end
    end
    

    You then want to obtain the gradient of L, which, of course, is a vector output. Now, to obtain this, ignoring the maths you mentioned for a second, if you're basically trying to use the gradient function correctly, then you just use it directly onto L, and specify the grid X Y used for it to specify the spacings between the different elements in L, and collect its output as a two-element array, so that you capture both the x and y vector-components of the gradient:

    [gLx, gLy] = gradient(L, X, Y);