Search code examples
python-3.xmachine-learningneural-networkbackpropagationscipy-optimize

scipy.optimize.minimize() not converging giving success=False


I recently tried to apply backpropagation algorithm in python, I tried fmin_tnc,bfgs but none of them actually worked, so please help me to figure out the problem.

    def sigmoid(Z):
        return 1/(1+np.exp(-Z))
    def costFunction(nnparams,X,y,input_layer_size=400,hidden_layer_size=25,num_labels=10,lamda=1):
        #input_layer_size=400; hidden_layer_size=25; num_labels=10; lamda=1;
        Theta1=np.reshape(nnparams[0:hidden_layer_size*(input_layer_size+1)],(hidden_layer_size,(input_layer_size+1)))
        Theta2=np.reshape(nnparams[(hidden_layer_size*(input_layer_size+1)):],(num_labels,hidden_layer_size+1))
        m=X.shape[0]
        J=0;
        y=y.reshape(m,1)
        Theta1_grad=np.zeros(Theta1.shape)
        Theta2_grad=np.zeros(Theta2.shape)

        X=np.concatenate([np.ones([m,1]),X],1)
        a2=sigmoid(Theta1.dot(X.T));
        a2=np.concatenate([np.ones([1,a2.shape[1]]),a2])
        h=sigmoid(Theta2.dot(a2))
        c=np.array(range(1,11))
        y=y==c;

        for i in range(y.shape[0]):
            J=J+(-1/m)*np.sum(y[i,:]*np.log(h[:,i]) + (1-y[i,:])*np.log(1-h[:,i]) );
        DEL2=np.zeros(Theta2.shape); DEL1=np.zeros(Theta1.shape);
        for i in range(m):
            z2=Theta1.dot(X[i,:].T);
            a2=sigmoid(z2).reshape(-1,1);
            a2=np.concatenate([np.ones([1,a2.shape[1]]),a2])
            z3=Theta2.dot(a2);
          #  print('z3 shape',z3.shape)
            a3=sigmoid(z3).reshape(-1,1);
          #  print('a3 shape = ',a3.shape)
            delta3=(a3-y[i,:].T.reshape(-1,1));
          #  print('y shape ',y[i,:].T.shape)
            delta2=((Theta2.T.dot(delta3)) * (a2 * (1-a2)));
          #  print('shapes = ',delta3.shape,a3.shape)
            DEL2 = DEL2 + delta3.dot(a2.T);
            DEL1 = DEL1 + (delta2[1,:])*(X[i,:]);

        Theta1_grad=np.zeros(np.shape(Theta1));
        Theta2_grad=np.zeros(np.shape(Theta2));

        Theta1_grad[:,0]=(DEL1[:,0] * (1/m));
        Theta1_grad[:,1:]=(DEL1[:,1:] * (1/m)) + (lamda/m)*(Theta1[:,1:]);
        Theta2_grad[:,0]=(DEL2[:,0] * (1/m));
        Theta2_grad[:,1:]=(DEL2[:,1:]*(1/m)) + (lamda/m)*(Theta2[:,1:]);

        grad=np.concatenate([Theta1_grad.reshape(-1,1),Theta2_grad.reshape(-1,1)]);
        return J,grad

This is how I called the function (op is scipy.optimize)

r2=op.minimize(fun=costFunction, x0=nnparams, args=(X, dataY.flatten()), 
            method='TNC', jac=True, options={'maxiter': 400})

r2 is like this

fun: 3.1045444063663266
 jac: array([[-6.73218494e-04],
   [-8.93179045e-05],
   [-1.13786179e-04],
   ...,
   [ 1.19577741e-03],
   [ 5.79555099e-05],
   [ 3.85717533e-03]])
   message: 'Linear search failed'
   nfev: 140
   nit: 5
   status: 4
   success: False
      x: array([-0.97996948, -0.44658952, -0.5689309 , ...,  0.03420931,
      -0.58005183, -0.74322735])

Please help me to find correct way of minimizing this function, Thanks in advance


Solution

  • Finally Solved it, The problem was I used np.randn() to generate random Theta values which gives random values in a standard normal distribution, therefore as too many values were within the same range,therefore this lead to symmetricity in the theta values. Due to this symmetricity problem the optimization terminates in the middle of the process. Simple solution was to use np.rand() (which provide uniform random distribution) instead of np.randn()