I implemented a model with several consecutive TimeDistributed layers. My last layer is defined as followed :
y_pred = TimeDistributed(Dense(output_dim, name="y_pred", kernel_initializer=init, bias_initializer=init, activation="softmax"), name="out")(x)
I would like to remove the activation "softmax" of the latter to access its logits i.e :
logit = TimeDistributed(Dense(output_dim, name="fc6", kernel_initializer=init, bias_initializer=init), name="logit")(x)
If I want to get back the initial y_pred, I wrote :
(1) y_pred = TimeDistributed(Activation('softmax'), name="pred")(logit)
I'm confused because the following line seems to work also :
(2) y_pred = Activation('softmax', name="pred")(logit)
Which one is correct ? (1) or (2) ? Regards
It actually follows the same semantics as by default Activation('softmax')
applies the activation to last axis=-1
. It is the default argument. So even if you use TimeDistributed
you are applying it to the last dimension but the latter without distribution would be faster as it involves less operations.