Separate activation from a timedistributed layer

I implemented a model with several consecutive TimeDistributed layers. My last layer is defined as followed :

y_pred = TimeDistributed(Dense(output_dim, name="y_pred", kernel_initializer=init, bias_initializer=init, activation="softmax"), name="out")(x)

I would like to remove the activation "softmax" of the latter to access its logits i.e :

logit = TimeDistributed(Dense(output_dim, name="fc6", kernel_initializer=init, bias_initializer=init), name="logit")(x)

If I want to get back the initial y_pred, I wrote :

(1) y_pred = TimeDistributed(Activation('softmax'), name="pred")(logit)

I'm confused because the following line seems to work also :

(2) y_pred = Activation('softmax', name="pred")(logit)

Which one is correct ? (1) or (2) ? Regards

Solution

It actually follows the same semantics as by default Activation('softmax') applies the activation to last axis=-1. It is the default argument. So even if you use TimeDistributed you are applying it to the last dimension but the latter without distribution would be faster as it involves less operations.