I implement a batch-based back-propagation algorithm for a neural network with one hidden layer and sigmoid activation function. The output layer is one-hot Sigmoid layer. The net of first layer is z1. After apply sigmoid it becomes a1. similarly, we have z2 and a2 for the second layer.
The back-propagation process is like this:
x, y = train_data, train_target
for i in range(0, num_passes):
# call
z1, a1, z2, a2 = predict(current_model, x)
# recall
derv_out2 = (y - a2) * (a2 * (1 - a2))
delta2 = np.matmul(np.transpose(a1), derv_out2) / train_size
dw2 = delta2 + reg_lambda * w2
db2 = np.mean(b2 * derv_out2, 0)
derv_out1 = a1 * np.reshape(np.sum(delta2 * w2, 1), [1, a1.shape[1]])
delta1 = np.matmul(np.transpose(x), derv_out1) / train_size
dw1 = delta1 + reg_lambda * w1
db1 = np.mean(b1 * derv_out1, 0)
# gradient descent parameter update
w1 += learning_rate * dw1
b1 += learning_rate * db1
w2 += learning_rate * dw2
b2 += learning_rate * db2
# assign new parameters to the model
current_model = {'w1': w1, 'b1': b1, 'w2': w2, 'b2': b2}
complete code file: link
The loss of the above algorithms is decreasing, but accuracy of classification is about random choice. What is the problem?
based on @bivouac0 comment, I try to tune learning rate. I found that learning rate of 0.1 or 0.01 is very low for elementary steps (causes long learning time). Afterward I implement adaptive approach to tune learning rate (increase the rate when loss is descending and decrease when ascending). Using this approach the accuracy improved significantly.