Search code examples
numpydeep-learningkeraslinear-algebramatrix-multiplication

Understanding varying neurons in deep learning?


In performing linear algebra operations how does deep learning vary the number of neurons in each layer and maintain correct algebra operations ?

This code generates matrices of various dimensions and calculates dot product, I've attempted to just concentrate on core of operation of moving values through a network.

import numpy as np

def gm(m , n):
  return np.random.uniform(1 , 1 , (m , n))

x = gm(4,2)
print('x' , x)

m1 = gm(2,3)
print('m1' , m1)

d1 = np.dot(x , m1)
print('d1' , d1)

prints :

x [[1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]]
m1 [[1. 1. 1.]
 [1. 1. 1.]]
d1 [[2. 2. 2.]
 [2. 2. 2.]
 [2. 2. 2.]
 [2. 2. 2.]]

If the output matrix is a 2x1 how is this produced ? In other words how to produce a 2x1 matrix from matrix d1 ?

Frameworks such as keras abstract this , but how does abstraction take place ?


Solution

  • Mapping is relatively simple, and I will focus purely on MLPs (as other neural networks, with complex structure do many other things, but the core idea is the same).

    Say your input is a batch of size [B x D], where B - number of samples (can be 1), and D - number of features (input dimensions). As the output you want to have [B x K] where K - number of outputs.

    Typical MLP is just a sequence of affine transformations followed by some (point-wise) nonlinearities f:

    h_{i+1} = f( h_i W_i + B_i)
    

    where h_0 = X (inputs), and h_N is the output

    Say I want to have one hidden layer, with Z neurons, then all I have to do is create two matrices W and two vectors B, one pair will be used to map inputs to Z dimensions, and the other to map from Z to the desired K:

    W_1 is D x Z
    B_1 is 1 x Z
    
    W_2 is Z x K
    B_2 is 1 x K
    

    Consequently we have

    f(X W_1 + B_1) W_2 + B_2   
    
    X W_1 is [B x D] [D x Z] = [B x Z]
    X W_1 + B_1 is [B x Z] + [1 x Z] = [B x Z]   # note that summation 
                        # is overloaded in the sense that it is adding the
                        # same B to each row of the argument
    f(X W_1 + B_1) is [B x Z]
    f(X W_1 + B_1) W_2 is [B x Z] [Z x K] = [B x K]
    f(X W_1 + B_1) W_2 + B_2 is [B x K] + [1 x K] = [B x K]
    

    Analogously with more hidden layers, you just matrix multiply on the right with the matrix of size [dimension_of_previous_one x desired_output_dimension], which is just regular linear projection operation from mathematics (and biases make it affine)