Search code examples
matlabneural-networkvectorizationoctavegradient-descent

Computing the gradients for a two-layer neural network


I would like to know if the following MATLAB/Octave code can be vectorized?

function grads = compute_grads(data, ann, lambda)
    [~, N] = size(data.X);
    % First propagate the data
    S = evaluate(data.X, ann);
    G = -(data.Y - S{2});

    % Second layer gradient is easy.
    l2g.W = G*S{1}';
    l2g.b = mean(G)';
    G = G' * ann{2}.W;

    [m, d] = size(ann{1}.W);
    [K, ~] = size(ann{2}.W);

    % I would like to vectorize this calculation.
    l1g.W = zeros(m, d);
    l1g.b = mean(G)';
    for i = 1:N
        x = data.X(:, i);
        g = G(i, :);
        l1 = S{1}(:, i);
        g = g * diag(l1 > 0);
        l1g.W = l1g.W + g'*x';
    end
    grads = {l1g, l2g};
    for k=1:length(grads)
        grads{k}.W = grads{k}.W/N + 2*lambda*ann{k}.W;
    end
end

The code computes the gradients for a two-layer neural network. The second layer has a softmax activation function as shown by line 4 G = -(data.Y - S{2});. The first layer has ReLU activation implemented by the gunk in the for-loop which operates on each sample at a time.

As you can see, there is an explicit for-loop in the middle. Are there any array/matrix functions one can use instead to make the looping implicit?


Solution

  • The loop can be reduced to:

    l1g.W = (data.X * (G .* (S{1} > 0).')).';
    

    Explanation:

    In vectorization we should avoid unnecessary operations. For example in

    g = g * diag(l1 > 0);;
    

    we can use element-wize multiplication to achieve the same thing:

    g = g .* (l1.' > 0);
    %or
    g = g .* (l1 > 0).';
    

    Using that we can place some operations outside of the loop:

    l1g.W = zeros(m, d);
    
    G = G .* (S{1} > 0).';
    
    for i = 1:N
        x = data.X(:, i);
        g = G(i, :);
        l1g.W = l1g.W + g'*x';
    end
    

    So we have something like this:

    W=0;
    for i = 1:N
        W = W + something(i);
    end
    

    that can be written as:

    W = sum(something);
    

    Our loop can be reduced to:

    l1g.W = sum(some_structrue_created_by_vectorizing(g'*x'));
    

    We can use functions such as bsxfun to create such a structure (i.e. a 3D matrix) but often such a structure requires a large amount of memory and the loop may be more efficient than vectorization. But wait we want to do sum of product of g and x so we can [and should always] think about using vector-matrix or matrix-matrix multiplication because they are very fast operations. As we are performing outer product of g and x so matrix-matrix multiplication is the right choice.

    G = G .* (S{1} > 0).';
    l1g.W  = (data.X * G).'
    

    or

    l1g.W = (data.X * (G .* (S{1} > 0).')).';