I would like to build a DCGAN for MNIST by myself in TensorFlow. However, I'm struggling to find out how I should set up the loss function for the generator. In a Keras DCGAN implementation the author used a little "workaround" for this problem: he simply built 3 models. The generator (G), the discriminator (D) and third one, where he just combined G with D, while setting the train-ability of D to false there.
This way, he can feed D with real images + generated images to train D and train the G+D-combined model, because the loss of D is propagated to G, since D is not trainable in the G+D-combined model.
In TensorFlow, I've built G and D already. Training D is relatively simple, since I just need to combine a batch of real MNIST training images with generated ones and call the training op:
session.run(D_train_op,
feed_dict={x: batch_x, y: batch_y})
The training op in this example is a binary cross entropy:
tf.losses.softmax_cross_entropy(y, D_out)
...but how would I set up the loss function for G, when I do not have a "stacked" model, combining "G and D" to single, third model?
I know that I have to generate a batch of images with G, feed them into D and then I can obtain the loss of D...however, the output of G is of shape (batch_size, 28, 28, 1)
. How would I set up a loss function for G by hand?
Without the "G and D"-combined model "workaround" for this, I have to propagate the loss of D, which has an output shape of (batch_size, 1)
to the output layer of G.
If G would do some classification for example, this wouldn't be that hard to figure out...but G outputs images. Thus, I can not directly map the loss of D to the output layer of G.
Do I have to set up a third model combining G+D? Or is there a way to calculate the loss for G by hand?
Any help is highly appreciated :)
In the generator step training, you can think that the network involves the discriminator too. But to do the backpropagation, you will only consider the generator weights. A good explanation for it is found here.
As mentioned in original paper, the Discriminator cost is:
And the generator cost is:
Of course, you don't need to calculate it by hand. Tensorflow already handles it. To do all the process, you can implement the following:
G_sample = generator(z)
D_real = discriminator(X)
D_fake = discriminator(G_sample)
D_loss = tf.reduce_mean(-tf.log(D_real)-tf.log(1-D_fake))
G_loss = tf.reduce_mean(-tf.log(D_fake))
where D_real, D_fake and D_sample are the last layers of your network. Then you can implement the training process by the standard way:
D_solver = (tf.train.AdamOptimizer(learning_rate=0.0001,beta1=0.5)
.minimize(D_loss, var_list=theta_D))
G_solver = (tf.train.AdamOptimizer(learning_rate=0.0001,beta1=0.5)
.minimize(G_loss, var_list=theta_G))
And just run the solvers in a session.