Search code examples
pythontensorflowkerasgenerative-adversarial-networkdropout

Dropout layer directly in tensorflow: how to train?


After I created my model in Keras, I want to get the gradients and apply them directly in Tensorflow with the tf.train.AdamOptimizer class. However, since I am using a Dropout layer, I don't know how to tell to the model whether it is in the training mode or not. The training keyword is not accepted. This is the code:

    net_input = Input(shape=(1,))
    net_1 = Dense(50)
    net_2 = ReLU()
    net_3 = Dropout(0.5)
    net = Model(net_input, net_3(net_2(net_1(net_input))))

    #mycost = ...

    optimizer = tf.train.AdamOptimizer()
    gradients = optimizer.compute_gradients(mycost, var_list=[net.trainable_weights])
    # perform some operations on the gradients
    # gradients = ...
    trainstep = optimizer.apply_gradients(gradients)

I get the same behavior with and without dropout layer, even with dropout rate=1. How to solve this?


Solution

  • As @Sharky already said you can use training argument while invoking call() method of Dropout class. However, if you want to train in tensorflow graph mode you need to pass a placeholder and feed it boolean value during training. Here is the example of fitting Gaussian blobs applicable to your case:

    import tensorflow as tf
    import numpy as np
    from sklearn.datasets import make_blobs
    from sklearn.model_selection import train_test_split
    from tensorflow.keras.layers import Dense
    from tensorflow.keras.layers import Dropout
    from tensorflow.keras.layers import ReLU
    from tensorflow.keras.layers import Input
    from tensorflow.keras import Model
    
    x_train, y_train = make_blobs(n_samples=10,
                                  n_features=2,
                                  centers=[[1, 1], [-1, -1]],
                                  cluster_std=1)
    
    x_train, x_test, y_train, y_test = train_test_split(
        x_train, y_train, test_size=0.2)
    
    # `istrain` indicates whether it is inference or training
    istrain = tf.placeholder(tf.bool, shape=()) 
    y = tf.placeholder(tf.int32, shape=(None))
    net_input = Input(shape=(2,))
    net_1 = Dense(2)
    net_2 = Dense(2)
    net_3 = Dropout(0.5)
    net = Model(net_input, net_3(net_2(net_1(net_input)), training=istrain))
    
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
            labels=y, logits=net.output)
    loss_fn = tf.reduce_mean(xentropy)
    
    optimizer = tf.train.AdamOptimizer(0.01)
    grads_and_vars = optimizer.compute_gradients(loss_fn,
                                                 var_list=[net.trainable_variables])
    trainstep = optimizer.apply_gradients(grads_and_vars)
    
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        l1 = loss_fn.eval({net_input:x_train,
                           y:y_train,
                           istrain:True}) # apply dropout
        print(l1) # 1.6264652
        l2 = loss_fn.eval({net_input:x_train,
                           y:y_train,
                           istrain:False}) # no dropout
        print(l2) # 1.5676715
        sess.run(trainstep, feed_dict={net_input:x_train,
                                       y:y_train, 
                                       istrain:True}) # train with dropout