I am a master's student currently studying NLP. I was reading the ELECTRA paper by Clark et al. I had a few doubts regarding the implementation and training.
I was wondering if you could help me with those.
Thanks in advance
Well, I tried looking online for answers, but they were not cconclusive. Regarding backpropagating the gradients, i think the gradients in discriminator are not backpropagated to the generator , both are trained separately, although the generated input of current step is put as input to the discriminator.
Okay, from the paper itself the answers can be given.