tensorflow speech-recognition autoencoder rbm

Dimension Reduction in CLDNN (tensorflow)

I'm trying to write an implementation of CLDNN with tensorflow, like the one in this scheme. I am having a problem with the dimension reduction layer.

As far as I understand it, it is made with several stacked Restricted Boltzmann Machines (RBMs) and works like an autoencoder. The decoder part of the layer is only here to train the encoder to reduce well dimensions. Meaning that you want to "plug" the encoder's output into the next layer's input.

I can define a loss function that will train the autoencoder (by comparing input from decoded output), and an other loss function that will train the whole graph. I there a way to train these two loss functions ? Or maybe I am misunderstanding the problem here, but its feels to me that the decoder part of the autoencoder is kinda left "outside the loop" and won't be trained.

I have found implementation of such autoencoders, and convolutionnal layers, etc... but I don't really understand how to "insert" the autoencoder inside the network (like in the scheme)

Solution

Paper says

The Computational Network Toolkit (CNTK) [24] is used for neural network training. As [14] suggests, we apply uniform random weight initialization for all layers without either generative or discriminative pretraining [1].

Dimension reduction in diagram is simply a dense projection layer. So they do not train any autoencoders, they just configure the network architecture and train the network from the random initial state.

Autoencoders were used before for subnetwork initialization, but they are not very popular now.