numpy deep-learning keras linear-algebra matrix-multiplication

Understanding varying neurons in deep learning?

In performing linear algebra operations how does deep learning vary the number of neurons in each layer and maintain correct algebra operations ?

This code generates matrices of various dimensions and calculates dot product, I've attempted to just concentrate on core of operation of moving values through a network.

import numpy as np

def gm(m , n):
  return np.random.uniform(1 , 1 , (m , n))

x = gm(4,2)
print('x' , x)

m1 = gm(2,3)
print('m1' , m1)

d1 = np.dot(x , m1)
print('d1' , d1)

prints :

x [[1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]]
m1 [[1. 1. 1.]
 [1. 1. 1.]]
d1 [[2. 2. 2.]
 [2. 2. 2.]
 [2. 2. 2.]
 [2. 2. 2.]]

If the output matrix is a 2x1 how is this produced ? In other words how to produce a 2x1 matrix from matrix d1 ?

Frameworks such as keras abstract this , but how does abstraction take place ?

Solution

Mapping is relatively simple, and I will focus purely on MLPs (as other neural networks, with complex structure do many other things, but the core idea is the same).

Say your input is a batch of size [B x D], where B - number of samples (can be 1), and D - number of features (input dimensions). As the output you want to have [B x K] where K - number of outputs.

Typical MLP is just a sequence of affine transformations followed by some (point-wise) nonlinearities f:

h_{i+1} = f( h_i W_i + B_i)

where h_0 = X (inputs), and h_N is the output

Say I want to have one hidden layer, with Z neurons, then all I have to do is create two matrices W and two vectors B, one pair will be used to map inputs to Z dimensions, and the other to map from Z to the desired K:

W_1 is D x Z
B_1 is 1 x Z

W_2 is Z x K
B_2 is 1 x K

Consequently we have

f(X W_1 + B_1) W_2 + B_2   

X W_1 is [B x D] [D x Z] = [B x Z]
X W_1 + B_1 is [B x Z] + [1 x Z] = [B x Z]   # note that summation 
                    # is overloaded in the sense that it is adding the
                    # same B to each row of the argument
f(X W_1 + B_1) is [B x Z]
f(X W_1 + B_1) W_2 is [B x Z] [Z x K] = [B x K]
f(X W_1 + B_1) W_2 + B_2 is [B x K] + [1 x K] = [B x K]

Analogously with more hidden layers, you just matrix multiply on the right with the matrix of size [dimension_of_previous_one x desired_output_dimension], which is just regular linear projection operation from mathematics (and biases make it affine)