Search code examples
pythonfor-loopneural-networkxor

A matrix that incorrectly contracts to a vector during a for loop - neural networks, Python


I'm trying to write simple 2 output XOR Neural Network in Python with no hidden layers. I have weight matrix of size (3,2). Because I have two outputs when learning I'm counting two separate errors: the first error suppose to update the first column of weight matrix, the second error - the second column weight matrix. Therefore, in the middle of the for loop used for the training process, I inserted a for loop iterated depending on the number of errors (2 in this case). In this internal for loop, it tries to update the weight matrix, but on the first iteration, after the weight matrix has been updated, for some reason unknown to me, it is shortened to a vector consisting only of the updated first column of the weight matrix.

Can someone help me fix this issue? The code:

import numpy as np

data = np.array([[0, 0],
                [0, 1],
                [1, 0],
                [1, 1]])
data = np.transpose(data)
print("Data:")
print(data)

#target = np.array([[0], [0], [0], [1]])     #AND
target = np.array([[0, 1],
                   [1, 0],
                   [1, 0],
                   [0, 1]])     #XOR
print("Targets:")
print(target)

# get the number of columns
numberOfColumns = data.shape[1]
print('Number of columns', numberOfColumns)

inputSize = data.shape[0]
print('Number of rows', inputSize)

# add the vector of ones to the data matrix - polarization
data = np.r_[data, np.ones((1, numberOfColumns), int)]


# sigmoid activation function
def sigmoid(x, derivative=False):
    v = 1 / (1 + np.exp(-x))

    if derivative:
        v = v * (1 - 1 / (1 + np.exp(-x)))

    return v


# linear activation function
def linear(x, derivative=False):
    v = x

    if derivative:
        v = np.ones(x.shape)

    return v


eta = 0.1  # learning rate
epochs = 1000   #number of epochs
afun = sigmoid  #activation function
#afun = linear
w = np.array([[0.01, 0.2],   #net weights
              [0.1, 0.1],
              [0.2, 0.01]])



y = np.zeros(numberOfColumns)
e = 0
for j in range(numberOfColumns):
    y= afun(np.dot(w.transpose(),data))
    y= np.transpose(y)
    #print("Y:")
    #print(y)
    e = e + 0.5 * np.power((y[j, ] - target[j]), 2)
    #print("Error:")
    #print(e)

e=np.mean(e)

print("\nBefore training")
print("Weights:")
print(w)
print("Y:")
print(y)
print("Error:")
print(e)

#####################################
print("\nData after polarization:")
print(data)

#####################################
print("\nLearning")

for epoch in range(epochs):
    for j in range(numberOfColumns):
        y = afun(np.dot(data[:, j], w))
        #print("Y:")
        #print(y)
        error = y - target[j]
        print("Error:")
        print(error)
        print("Error shape:")
        print(error.shape[0])
        print("w shape:")
        print(w.shape)
        for i in range(error.shape[0]):
            dw = eta * error[i] * np.multiply(afun(data[:, j], True), data[:, j])
            print("dw:")
            print(dw)
            print("w before:")
            print(w)
            w = w[:, i]-np.transpose(dw)
            print("w after:")
            print(w)

# Checking the learning
y = np.zeros(numberOfColumns)
e = 0
for j in range(numberOfColumns):
    y = afun(np.dot(w.transpose(), data))
    y = np.transpose(y)
    #print("Y:")
    #print(y)
    e = e + 0.5*np.power((y[j, ] - target[j]), 2)

e=np.mean(e)

print("\nAfter training")
print("Weights:")
print(w)
print("Net output:")
print(y)
print("Error:")
print(e)

I honestly don't understand the issue. I don't know why weight matrix is shortened to a vector consisting only of the updated first column of the weight matrix. I was expecting to update the first column of weight matrix in the first iteration, and then the second column in the second iteration.


Solution

  • You are redefining w on line 106 in your code:

    w = w[:, i] - np.transpose(dw)
    

    I imagine that should be:

    w[:, i] = w[:, i] - np.transpose(dw)