Search code examples
python-3.xmachine-learningtensorflowneural-networkxor

Writing a basic XOR neural network program


I am trying to write a neural network that recognizes the xor function from scratch. The full code is here (in python 3).

I am currently getting the error :

ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients

I am new to tensorflow and I don't understand why this is. Can anyone help me out in correcting my code? Thanks in advance.

P.S. If more details are required in the question, do let me know before downvoting. Thanks again!

Edit: relevant part of code:

def initialize_parameters():
    # Create Weights and Biases for Hidden Layer and Output Layer
    W1 = tf.get_variable("W1", [2, 2], initializer = tf.contrib.layers.xavier_initializer())
    b1 = tf.get_variable("b1", [2, 1], initializer = tf.zeros_initializer())
    W2 = tf.get_variable("W2", [1, 2], initializer = tf.contrib.layers.xavier_initializer())
    b2 = tf.get_variable("b2", [1, 1], initializer = tf.zeros_initializer())
    parameters = {
            "W1" : W1,
            "b1" : b1,
            "W2" : W2,
            "b2" : b2
    }
    return parameters

def forward_propogation(X, parameters):

    threshold = tf.constant(0.5, name = "threshold")
    W1, b1 = parameters["W1"], parameters["b1"]
    W2, b2 = parameters["W2"], parameters["b2"]

    Z1 = tf.add(tf.matmul(W1, X), b1)
    A1 = tf.nn.relu(Z1)
    tf.squeeze(A1)
    Z2 = tf.add(tf.matmul(W2, A1), b2)
    A2 = tf.round(tf.sigmoid(Z2))
    print(A2.shape)
    tf.squeeze(A2)
    A2 = tf.reshape(A2, [1, 1])
    print(A2.shape)
    return A2

def compute_cost(A, Y):

    logits = tf.transpose(A)
    labels = tf.transpose(Y)
    cost = tf.nn.sigmoid_cross_entropy_with_logits(logits = logits, labels = labels)
    return cost

def model(X_train, Y_train, X_test, Y_test, learning_rate = 0.0001, num_epochs = 1500):

    ops.reset_default_graph()
    (n_x, m) = X_train.shape
    n_y = Y_train.shape[0]
    costs = []
    X, Y = create_placeholders(n_x, n_y)
    parameters = initialize_parameters()
    A2 = forward_propogation(X, parameters)
    cost = compute_cost(A2, Y)
    optimizer = tf.train.AdamOptimizer(learning_rate = learning_rate).minimize(cost)
    init = tf.global_variables_initializer()

    with tf.Session() as session:
        session.run(init)
        for epoch in range(num_epochs):
            epoch_cost = 0
            _, epoch_cost = session.run([optimizer, cost], feed_dict = {X : X_train, Y : Y_train})
        parameters = session.run(parameters)
        correct_prediction = tf.equal(tf.argmax(A2), tf.argmax(Y))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
        print("Training Accuracy is {0} %...".format(accuracy.eval({X : X_train, Y : Y_train})))
        print("Test Accuracy is {0} %...".format(accuracy.eval({X : X_test, Y : Y_test})))
    return parameters

Solution

  • The error is caused by the use of tf.round when you define A2 (known issue, by the way).

    In this particular task, the solution is simply not to use tf.round at all. Remember that, the output of tf.sigmoid is the value between 0 and 1, which can be interpreted as probability of result 1. Cross-entropy loss function is measuring the distance to the target, 0 or 1, and computes the needed update to the weights based on this distance. Calling tf.round before the cross-entropy will squeeze the probability to either 0 or 1 - that's will make cross-entropy pretty meaningless.

    By the way, tf.losses.softmax_cross_entropy should work better, because you've applied the sigmoid yourself in the second layer.