python python-3.x neural-network perceptron

Perceptron Neural Network will not learn values in a specific range

I'm playing around with Neural Nets and wanted to make a clean class implementation to handle any size net. Currently, I'm debugging my learning function to deal with 2-Layer networks.

In it's current state using logistic activation:

It cannot learn values below 0.5
It cannot handle matrices of input vectors (only single input vectors), this can be implemented later
If initial weights and bias result in output less than 0.5, it will likely learn towards 0
Assumption: In "ideal" conditions, it will learn any value between 0.5 and 1 using any combination of binary input
- This has been tested with 2 and 3 inputs to network
Does proper forward propagation regardless of number of layers

Here's the relevant code:

import numpy as np

def logistic(x, deriv = False):
  '''
  If using the derivative, input must be result of logistic
  '''
  if deriv:
    return x*(1-x)
    
  return 1/(1+np.exp(-x))

def feed_forw(input, weights):
  '''
  ***Wrapper for input.dot(weights)
  Input should be a np.array the same length as number of input nodes
    - A row of input represents the vector of input nodes
    - Different Rows are different input cases
  Weights is a 2D np.array of weights for each input node to each output node
    - dimensions of weights will determine length of output vector
    - top row is weights going from first input to node to all output nodes
    - first col is weights going from all input nodes to first output node
  '''

  return input.dot(weights)

class ANN:
  '''
  Artificial Neural Network of Perceptron Design
  Member Attributes:
    Weights: tuple of np.array
    - # of elements define number of layers
    - shapes of each element define nodes of each connecting pair of connecting layers
    Bias: tuple of np.array
    - added to each node after the first layer on a per layer basis
    - must have same dimensions as output from each corresponding element in Weights
    Target: np.array
    - array representing desired output.
  '''
  
  def __init__(self, weights, bias = 0, target = None):
    self._weights = weights
    self._bias = bias
    self._target = target

  def __str__(self):
    data = ''
    for w,b in zip(self._weights, self._bias):
      data += f'Weight\n{w}\nbias\n{b}\n'

    return f'{data}Seeking\n{self._target}\n'

  def _forwardProp(self, v, activation):
    '''
    Helper function to Learn
    '''
    out = []
    out.append(v.copy())
    for w,b in zip(self._weights, self._bias):
      out.append(feed_forw(out[-1], w) + b)
      out.append(activation(out[-1]))
    return out

  def setTarget(self, target):
    self._target = target

  def learn(self, input, activation, epoch = 10, eta = 1, debug = False):
    '''
    ***Currently only functions with 2-Layer perceptrons***
    input: np.array
    - a matrix representing each of case of input vectors
    - rows are input vectors for a single case
    activation: function object
    - An activation function used to normalize output
    epoch: int
    - test cycles
    eta: int
    - learning parameter
    '''
    for e in range(epoch):
      layers = self._forwardProp(input, activation)
      #layers is a list for keeping track of changes between layers
      #Pattern follows:
      #[input, layer 0 - weighted sum, layer 1 - activation, layer 1(output) - 
      #   weighted sum, layer 2 - activation, layer 2 ... 
      #   weighted sum, output layer - activation, ouput layer]
      
      #Final element is always network output
      error = layers[-1] - self._target

      delta_out = error * activation(layers[-1], deriv = True)
      #derivError_out = delta_out * activation(layers[-3].T*self._weights[-1])
      #derivError_out = delta_out * layers[-3].T*self._weights[-1]
      #EDIT
      derivError_out = delta_out * layers[-3].T
      derivError_bias = delta_out * self._bias[-1].T
      self._weights += -eta*derivError_out
      self._bias += -eta*derivError_bias

      if debug:
        print(f'Epoch {e+1}:\nOutput:\n{layers[-1]}\nError is\n{error}\nDelta Out Node:\n{delta_out}')
        print(f'Weight Increment:\n{derivError_out}\nBias Increment:\n{derivError_bias}')
        print(f'State after training rotation:\n{self}')

      #i = 1
      #while i < len(layers) + 1:
        #This loop will count from the last element of layers, will go back by 2
        #...
        #i += 2

The code used to test and its output:

w2 = np.array([[0.03],
               [-0.1]])
b2 = np.array([[0.7]])
nn1 = ANN((w2,), (b2,))
x = np.array([[1,1]])
t = np.array([[0.7]])
nn1.setTarget(t)
nn1.learn(x, logistic, 100, debug = True)
'''
Epoch 1:
Output:
[[0.65248946]]
Error is
[[-0.04751054]]
Delta Out Node:
[[-0.01077287]]
Weight Increment:
[[-0.00032319]
 [ 0.00107729]]
Bias Increment:
[[-0.00754101]]
State after training rotation:
Weight
[[ 0.03032319]
 [-0.10107729]]
bias
[[0.70754101]]
Seeking
[[0.7]]

Epoch 2:
Output:
[[0.65402678]]
Error is
[[-0.04597322]]
Delta Out Node:
[[-0.01040263]]
Weight Increment:
[[-0.00031544]
 [ 0.00105147]]
Bias Increment:
[[-0.00736028]]
State after training rotation:
Weight
[[ 0.03063863]
 [-0.10212876]]
bias
[[0.71490129]]
Seeking
[[0.7]]
...
Epoch 99:
Output:
[[0.69871509]]
Error is
[[-0.00128491]]
Delta Out Node:
[[-0.00027049]]
Weight Increment:
[[-1.08348447e-05]
 [ 3.61161491e-05]]
Bias Increment:
[[-0.00025281]]
State after training rotation:
Weight
[[ 0.04006734]
 [-0.13355782]]
bias
[[0.93490471]]
Seeking
[[0.7]]

Epoch 100:
Output:
[[0.69876299]]
Error is
[[-0.00123701]]
Delta Out Node:
[[-0.00026038]]
Weight Increment:
[[-1.04328444e-05]
 [ 3.47761479e-05]]
Bias Increment:
[[-0.00024343]]
State after training rotation:
Weight
[[ 0.04007778]
 [-0.13359259]]
bias
[[0.93514815]]
Seeking
[[0.7]]
'''
#This cell is rerun with
t = np.array([[0.4]])
'''
Epoch 1:
Output:
[[0.65248946]]
Error is
[[0.25248946]]
Delta Out Node:
[[0.05725122]]
Weight Increment:
[[ 0.00171754]
 [-0.00572512]]
Bias Increment:
[[0.04007585]]
State after training rotation:
Weight
[[ 0.02828246]
 [-0.09427488]]
bias
[[0.65992415]]
Seeking
[[0.4]]

Epoch 2:
Output:
[[0.64426676]]
Error is
[[0.24426676]]
Delta Out Node:
[[0.05598279]]
Weight Increment:
[[ 0.00158333]
 [-0.00527777]]
Bias Increment:
[[0.0369444]]
State after training rotation:
Weight
[[ 0.02669913]
 [-0.08899711]]
bias
[[0.62297975]]
Seeking
[[0.4]]
...
Epoch 99:
Output:
[[0.50544009]]
Error is
[[0.10544009]]
Delta Out Node:
[[0.0263569]]
Weight Increment:
[[ 2.73123106e-05]
 [-9.10410354e-05]]
Bias Increment:
[[0.00063729]]
State after training rotation:
Weight
[[ 0.00100894]
 [-0.00336312]]
bias
[[0.02354185]]
Seeking
[[0.4]]

Epoch 100:
Output:
[[0.50529672]]
Error is
[[0.10529672]]
Delta Out Node:
[[0.02632123]]
Weight Increment:
[[ 2.65564469e-05]
 [-8.85214898e-05]]
Bias Increment:
[[0.00061965]]
State after training rotation:
Weight
[[ 0.00098238]
 [-0.0032746 ]]
bias
[[0.0229222]]
Seeking
[[0.4]]
'''
#Cell is rerun again with
b2 = np.array([[-0.7]])
t = np.array([[0.4]])
'''
Epoch 1:
Output:
[[0.31647911]]
Error is
[[-0.08352089]]
Delta Out Node:
[[-0.01806725]]
Weight Increment:
[[-0.00054202]
 [ 0.00180672]]
Bias Increment:
[[0.01264707]]
State after training rotation:
Weight
[[ 0.03054202]
 [-0.10180672]]
bias
[[-0.71264707]]
Seeking
[[0.4]]

Epoch 2:
Output:
[[0.31347742]]
Error is
[[-0.08652258]]
Delta Out Node:
[[-0.01862047]]
Weight Increment:
[[-0.00056871]
 [ 0.00189569]]
Bias Increment:
[[0.01326982]]
State after training rotation:
Weight
[[ 0.03111072]
 [-0.10370241]]
bias
[[-0.72591689]]
Seeking
[[0.4]]
...
Epoch 99:
Output:
[[0.01206264]]
Error is
[[-0.38793736]]
Delta Out Node:
[[-0.0046231]]
Weight Increment:
[[-0.00079352]
 [ 0.00264508]]
Bias Increment:
[[0.01851554]]
State after training rotation:
Weight
[[ 0.17243664]
 [-0.57478879]]
bias
[[-4.02352151]]
Seeking
[[0.4]]

Epoch 100:
Output:
[[0.01182232]]
Error is
[[-0.38817768]]
Delta Out Node:
[[-0.0045349]]
Weight Increment:
[[-0.00078198]
 [ 0.00260661]]
Bias Increment:
[[0.01824629]]
State after training rotation:
Weight
[[ 0.17321862]
 [-0.5773954 ]]
bias
[[-4.04176779]]
Seeking
[[0.4]]
'''

I can see that when the output is less than 0.5, for some reason, this will make the output go lower no matter what. If the starting output is less that than 0.5, it will only learn a value that is less than the starting output. If the starting output is 0.5 or greater, it will only learn a value also greater than 0.5. And yet, I still can't figure the solution to that problem (elegantly, at least).

These are the two cases of contention, so I could just brute force a fix. But, I won't learn what mistake I'm making.

I know there are multiple ways to implement this network and there even exists this ridiculously simply variant seen on this blog whose math I still can't make sense of. But, after working on this one thing for weeks, I can only assume it's some small error I've not been able to see.

This edit takes one step toward the solution.

Within the class definition the following line was changed.

#derivError_out = delta_out * activation(layers[-3].T*self._weights[-1])
#derivError_out = delta_out * layers[-3].T*self._weights[-1]
derivError_out = delta_out * layers[-3].T

Changes

When initial output is 0.5 or greater, network can learn any value between 0 and 1 Yay!
When initial output is less than 0.5, network can learn any value less than "not too much greater than starting output"
- This behavior is dependent on the weights and there seems to be an upper limit based on weights that the network can't learn. It will converge to 0 when trying to learn value greater than that limit

Solution

Proper derivations for incrementing neural network parameters can be found all around the internet. In the case of perceptrons, each bias is incremented by a "delta" value determined by the output node it is attached to. Rather than simply implement this, my bias nodes were being incremented by a product of itself and this "delta".

In my incompetence I wrote this expression as

delta_out = error * activation(layers[-1], deriv = True)
derivError_bias = delta_out * self._bias[-1].T
self._bias -= eta * derivError_bias

rather than self._bias -= eta * delta_out

The network can now learn any randomly assigned target with any randomly assigned weights and biases.