Here is a tensorflow graph, as we can see that one of the inputs of cross entropy is the output of the logit layer, not the output of softmax in the graph.
I searched about it and found "WARNING: This op expects unscaled logits, since it performs a softmax on logits internally for efficiency. Do not call this op with the output of softmax, as it will produce incorrect results." on this webpage.
My question is how the parameters of the softmax are obtained if this softmax is not trained?
Softmax is a parameter free activation function like RELU, Tanh or Sigmoid: it doesn't need to be trained. It only computes the exponential of every logit and then normalize the output vector by the sum of the exponentials.