python numpy machine-learning backpropagation

Backpropagation bug

I am trying to implement backpropagation from scratch. While my cost is decreasing, gradient check yields a whooping 0.767399376130221. I've been trying to figure out what's wrong and managed to slim down the code to these few lines:

 def forward(self,X,y):

    z2 = self.params_l1.dot(X.T) 
    a2 = self.sigmoid(z2) 
    z3 = self.params_l2.dot(a2) 
    a3 = self.sigmoid(z3)
    loss = self.cross_entropy(a3,y)

    return a3,loss,z2,a2,z3

def backward(self,X,y):

    n_examples = len(X)

    yh,loss,Z2,A2,Z3 =  self.forward(X,y)

    delta3 = np.multiply(-(yh - y),self.dsigmoid(Z3)) 

    delta2 = (np.dot(self.params_l2.T,delta3))*self.dsigmoid(Z2) 

    de3 = np.dot(delta3,A2.T)
    de2 = np.dot(delta2,X)

    self.params_l2 =  self.params_l2 -  self.lr * (de3 /n_examples)
    self.params_l1 =  self.params_l1 - self.lr *  (de2 / n_examples)


    return de3/n_examples ,de2 /n_examples

It is a simple (2,2,1) MLP. I'm using cross-entropy as the loss function. I am following the chain rule for the backprop. I suspect the problem may lay in the order in which i take the products, but i have tried every which way and still had no luck.

Solution

I managed to get a difference of 1.7250119005319425e-10 by computing delta3 just through yh - yand no further multiplications. Now I need to figure out why this is.