I built a CNN model on images one-class classification. The output tensor is a list which has 65 elements. I make this tensor input to Softmax Function, and got the classified result. I think the max value in this output tensor is the classified result, why not use this way to do classification task? Just the Softmax Function can be taken the derivative easily?
Softmax is used for multi-class classification. In multi-class class classification the model is expected to classify the input to single class with higher probability. Predicting with high probability enforces probabilities for other classes to be low.
As you stated one of the reason why one uses Softmax over max function is the softmax function is diffrential over Real Numbers and max function is not.
There are some other properties of softmax function that makes it suitable to use for neural networks compared to max. Firstly it is soft version of max function. Let's say the logits of neural network has 4 outputs of [0.5, 0.5, 0.69, 0.7]. Hard max returns 1 for maximum index(in this case for 4th index) and 0 for other indexes. This results information loss. Second important property of softmax is the output of sofmax function are in interval [0,1] and the sum of these values is equal to 1. For this reason the output of softmax function can be interpreted as probability. This means output can be considered as the confidence of the model to classify inputs to one of each output classes.