I'm playing around with Neural Nets and wanted to make a clean class implementation to handle any size net. Currently, I'm debugging my learning function to deal with 2-Layer networks.
In it's current state using logistic activation:
Here's the relevant code:
import numpy as np
def logistic(x, deriv = False):
'''
If using the derivative, input must be result of logistic
'''
if deriv:
return x*(1-x)
return 1/(1+np.exp(-x))
def feed_forw(input, weights):
'''
***Wrapper for input.dot(weights)
Input should be a np.array the same length as number of input nodes
- A row of input represents the vector of input nodes
- Different Rows are different input cases
Weights is a 2D np.array of weights for each input node to each output node
- dimensions of weights will determine length of output vector
- top row is weights going from first input to node to all output nodes
- first col is weights going from all input nodes to first output node
'''
return input.dot(weights)
class ANN:
'''
Artificial Neural Network of Perceptron Design
Member Attributes:
Weights: tuple of np.array
- # of elements define number of layers
- shapes of each element define nodes of each connecting pair of connecting layers
Bias: tuple of np.array
- added to each node after the first layer on a per layer basis
- must have same dimensions as output from each corresponding element in Weights
Target: np.array
- array representing desired output.
'''
def __init__(self, weights, bias = 0, target = None):
self._weights = weights
self._bias = bias
self._target = target
def __str__(self):
data = ''
for w,b in zip(self._weights, self._bias):
data += f'Weight\n{w}\nbias\n{b}\n'
return f'{data}Seeking\n{self._target}\n'
def _forwardProp(self, v, activation):
'''
Helper function to Learn
'''
out = []
out.append(v.copy())
for w,b in zip(self._weights, self._bias):
out.append(feed_forw(out[-1], w) + b)
out.append(activation(out[-1]))
return out
def setTarget(self, target):
self._target = target
def learn(self, input, activation, epoch = 10, eta = 1, debug = False):
'''
***Currently only functions with 2-Layer perceptrons***
input: np.array
- a matrix representing each of case of input vectors
- rows are input vectors for a single case
activation: function object
- An activation function used to normalize output
epoch: int
- test cycles
eta: int
- learning parameter
'''
for e in range(epoch):
layers = self._forwardProp(input, activation)
#layers is a list for keeping track of changes between layers
#Pattern follows:
#[input, layer 0 - weighted sum, layer 1 - activation, layer 1(output) -
# weighted sum, layer 2 - activation, layer 2 ...
# weighted sum, output layer - activation, ouput layer]
#Final element is always network output
error = layers[-1] - self._target
delta_out = error * activation(layers[-1], deriv = True)
#derivError_out = delta_out * activation(layers[-3].T*self._weights[-1])
#derivError_out = delta_out * layers[-3].T*self._weights[-1]
#EDIT
derivError_out = delta_out * layers[-3].T
derivError_bias = delta_out * self._bias[-1].T
self._weights += -eta*derivError_out
self._bias += -eta*derivError_bias
if debug:
print(f'Epoch {e+1}:\nOutput:\n{layers[-1]}\nError is\n{error}\nDelta Out Node:\n{delta_out}')
print(f'Weight Increment:\n{derivError_out}\nBias Increment:\n{derivError_bias}')
print(f'State after training rotation:\n{self}')
#i = 1
#while i < len(layers) + 1:
#This loop will count from the last element of layers, will go back by 2
#...
#i += 2
The code used to test and its output:
w2 = np.array([[0.03],
[-0.1]])
b2 = np.array([[0.7]])
nn1 = ANN((w2,), (b2,))
x = np.array([[1,1]])
t = np.array([[0.7]])
nn1.setTarget(t)
nn1.learn(x, logistic, 100, debug = True)
'''
Epoch 1:
Output:
[[0.65248946]]
Error is
[[-0.04751054]]
Delta Out Node:
[[-0.01077287]]
Weight Increment:
[[-0.00032319]
[ 0.00107729]]
Bias Increment:
[[-0.00754101]]
State after training rotation:
Weight
[[ 0.03032319]
[-0.10107729]]
bias
[[0.70754101]]
Seeking
[[0.7]]
Epoch 2:
Output:
[[0.65402678]]
Error is
[[-0.04597322]]
Delta Out Node:
[[-0.01040263]]
Weight Increment:
[[-0.00031544]
[ 0.00105147]]
Bias Increment:
[[-0.00736028]]
State after training rotation:
Weight
[[ 0.03063863]
[-0.10212876]]
bias
[[0.71490129]]
Seeking
[[0.7]]
...
Epoch 99:
Output:
[[0.69871509]]
Error is
[[-0.00128491]]
Delta Out Node:
[[-0.00027049]]
Weight Increment:
[[-1.08348447e-05]
[ 3.61161491e-05]]
Bias Increment:
[[-0.00025281]]
State after training rotation:
Weight
[[ 0.04006734]
[-0.13355782]]
bias
[[0.93490471]]
Seeking
[[0.7]]
Epoch 100:
Output:
[[0.69876299]]
Error is
[[-0.00123701]]
Delta Out Node:
[[-0.00026038]]
Weight Increment:
[[-1.04328444e-05]
[ 3.47761479e-05]]
Bias Increment:
[[-0.00024343]]
State after training rotation:
Weight
[[ 0.04007778]
[-0.13359259]]
bias
[[0.93514815]]
Seeking
[[0.7]]
'''
#This cell is rerun with
t = np.array([[0.4]])
'''
Epoch 1:
Output:
[[0.65248946]]
Error is
[[0.25248946]]
Delta Out Node:
[[0.05725122]]
Weight Increment:
[[ 0.00171754]
[-0.00572512]]
Bias Increment:
[[0.04007585]]
State after training rotation:
Weight
[[ 0.02828246]
[-0.09427488]]
bias
[[0.65992415]]
Seeking
[[0.4]]
Epoch 2:
Output:
[[0.64426676]]
Error is
[[0.24426676]]
Delta Out Node:
[[0.05598279]]
Weight Increment:
[[ 0.00158333]
[-0.00527777]]
Bias Increment:
[[0.0369444]]
State after training rotation:
Weight
[[ 0.02669913]
[-0.08899711]]
bias
[[0.62297975]]
Seeking
[[0.4]]
...
Epoch 99:
Output:
[[0.50544009]]
Error is
[[0.10544009]]
Delta Out Node:
[[0.0263569]]
Weight Increment:
[[ 2.73123106e-05]
[-9.10410354e-05]]
Bias Increment:
[[0.00063729]]
State after training rotation:
Weight
[[ 0.00100894]
[-0.00336312]]
bias
[[0.02354185]]
Seeking
[[0.4]]
Epoch 100:
Output:
[[0.50529672]]
Error is
[[0.10529672]]
Delta Out Node:
[[0.02632123]]
Weight Increment:
[[ 2.65564469e-05]
[-8.85214898e-05]]
Bias Increment:
[[0.00061965]]
State after training rotation:
Weight
[[ 0.00098238]
[-0.0032746 ]]
bias
[[0.0229222]]
Seeking
[[0.4]]
'''
#Cell is rerun again with
b2 = np.array([[-0.7]])
t = np.array([[0.4]])
'''
Epoch 1:
Output:
[[0.31647911]]
Error is
[[-0.08352089]]
Delta Out Node:
[[-0.01806725]]
Weight Increment:
[[-0.00054202]
[ 0.00180672]]
Bias Increment:
[[0.01264707]]
State after training rotation:
Weight
[[ 0.03054202]
[-0.10180672]]
bias
[[-0.71264707]]
Seeking
[[0.4]]
Epoch 2:
Output:
[[0.31347742]]
Error is
[[-0.08652258]]
Delta Out Node:
[[-0.01862047]]
Weight Increment:
[[-0.00056871]
[ 0.00189569]]
Bias Increment:
[[0.01326982]]
State after training rotation:
Weight
[[ 0.03111072]
[-0.10370241]]
bias
[[-0.72591689]]
Seeking
[[0.4]]
...
Epoch 99:
Output:
[[0.01206264]]
Error is
[[-0.38793736]]
Delta Out Node:
[[-0.0046231]]
Weight Increment:
[[-0.00079352]
[ 0.00264508]]
Bias Increment:
[[0.01851554]]
State after training rotation:
Weight
[[ 0.17243664]
[-0.57478879]]
bias
[[-4.02352151]]
Seeking
[[0.4]]
Epoch 100:
Output:
[[0.01182232]]
Error is
[[-0.38817768]]
Delta Out Node:
[[-0.0045349]]
Weight Increment:
[[-0.00078198]
[ 0.00260661]]
Bias Increment:
[[0.01824629]]
State after training rotation:
Weight
[[ 0.17321862]
[-0.5773954 ]]
bias
[[-4.04176779]]
Seeking
[[0.4]]
'''
I can see that when the output is less than 0.5, for some reason, this will make the output go lower no matter what. If the starting output is less that than 0.5, it will only learn a value that is less than the starting output. If the starting output is 0.5 or greater, it will only learn a value also greater than 0.5. And yet, I still can't figure the solution to that problem (elegantly, at least).
These are the two cases of contention, so I could just brute force a fix. But, I won't learn what mistake I'm making.
I know there are multiple ways to implement this network and there even exists this ridiculously simply variant seen on this blog whose math I still can't make sense of. But, after working on this one thing for weeks, I can only assume it's some small error I've not been able to see.
Within the class definition the following line was changed.
#derivError_out = delta_out * activation(layers[-3].T*self._weights[-1])
#derivError_out = delta_out * layers[-3].T*self._weights[-1]
derivError_out = delta_out * layers[-3].T
Changes
Proper derivations for incrementing neural network parameters can be found all around the internet. In the case of perceptrons, each bias is incremented by a "delta" value determined by the output node it is attached to. Rather than simply implement this, my bias nodes were being incremented by a product of itself and this "delta".
In my incompetence I wrote this expression as
delta_out = error * activation(layers[-1], deriv = True)
derivError_bias = delta_out * self._bias[-1].T
self._bias -= eta * derivError_bias
rather than self._bias -= eta * delta_out
The network can now learn any randomly assigned target with any randomly assigned weights and biases.