I recently tried to apply backpropagation algorithm in python, I tried fmin_tnc,bfgs but none of them actually worked, so please help me to figure out the problem.
def sigmoid(Z):
return 1/(1+np.exp(-Z))
def costFunction(nnparams,X,y,input_layer_size=400,hidden_layer_size=25,num_labels=10,lamda=1):
#input_layer_size=400; hidden_layer_size=25; num_labels=10; lamda=1;
Theta1=np.reshape(nnparams[0:hidden_layer_size*(input_layer_size+1)],(hidden_layer_size,(input_layer_size+1)))
Theta2=np.reshape(nnparams[(hidden_layer_size*(input_layer_size+1)):],(num_labels,hidden_layer_size+1))
m=X.shape[0]
J=0;
y=y.reshape(m,1)
Theta1_grad=np.zeros(Theta1.shape)
Theta2_grad=np.zeros(Theta2.shape)
X=np.concatenate([np.ones([m,1]),X],1)
a2=sigmoid(Theta1.dot(X.T));
a2=np.concatenate([np.ones([1,a2.shape[1]]),a2])
h=sigmoid(Theta2.dot(a2))
c=np.array(range(1,11))
y=y==c;
for i in range(y.shape[0]):
J=J+(-1/m)*np.sum(y[i,:]*np.log(h[:,i]) + (1-y[i,:])*np.log(1-h[:,i]) );
DEL2=np.zeros(Theta2.shape); DEL1=np.zeros(Theta1.shape);
for i in range(m):
z2=Theta1.dot(X[i,:].T);
a2=sigmoid(z2).reshape(-1,1);
a2=np.concatenate([np.ones([1,a2.shape[1]]),a2])
z3=Theta2.dot(a2);
# print('z3 shape',z3.shape)
a3=sigmoid(z3).reshape(-1,1);
# print('a3 shape = ',a3.shape)
delta3=(a3-y[i,:].T.reshape(-1,1));
# print('y shape ',y[i,:].T.shape)
delta2=((Theta2.T.dot(delta3)) * (a2 * (1-a2)));
# print('shapes = ',delta3.shape,a3.shape)
DEL2 = DEL2 + delta3.dot(a2.T);
DEL1 = DEL1 + (delta2[1,:])*(X[i,:]);
Theta1_grad=np.zeros(np.shape(Theta1));
Theta2_grad=np.zeros(np.shape(Theta2));
Theta1_grad[:,0]=(DEL1[:,0] * (1/m));
Theta1_grad[:,1:]=(DEL1[:,1:] * (1/m)) + (lamda/m)*(Theta1[:,1:]);
Theta2_grad[:,0]=(DEL2[:,0] * (1/m));
Theta2_grad[:,1:]=(DEL2[:,1:]*(1/m)) + (lamda/m)*(Theta2[:,1:]);
grad=np.concatenate([Theta1_grad.reshape(-1,1),Theta2_grad.reshape(-1,1)]);
return J,grad
This is how I called the function (op is scipy.optimize)
r2=op.minimize(fun=costFunction, x0=nnparams, args=(X, dataY.flatten()),
method='TNC', jac=True, options={'maxiter': 400})
r2 is like this
fun: 3.1045444063663266
jac: array([[-6.73218494e-04],
[-8.93179045e-05],
[-1.13786179e-04],
...,
[ 1.19577741e-03],
[ 5.79555099e-05],
[ 3.85717533e-03]])
message: 'Linear search failed'
nfev: 140
nit: 5
status: 4
success: False
x: array([-0.97996948, -0.44658952, -0.5689309 , ..., 0.03420931,
-0.58005183, -0.74322735])
Please help me to find correct way of minimizing this function, Thanks in advance
Finally Solved it, The problem was I used np.randn() to generate random Theta values which gives random values in a standard normal distribution, therefore as too many values were within the same range,therefore this lead to symmetricity in the theta values. Due to this symmetricity problem the optimization terminates in the middle of the process. Simple solution was to use np.rand() (which provide uniform random distribution) instead of np.randn()