I am building an lstm model. I tested my model using softmax and sigmoid activation function. In the documentation sigmoid is used for binary classification and softmax is used for multiclass classification. But in my case, both are giving the same results. Why is it so?
Here is my code:
embedding_vecor_length = 128
max_length = 700
model = Sequential()
model.add(Embedding(len(tokenizer.word_index)+1, embedding_vecor_length, input_length=max_length))
model.add(Conv1D(filters=32, kernel_size=5, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Conv1D(filters=16, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling1D(pool_size=2))
model.add(Bidirectional(LSTM(64)))
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
print(model.summary())
Here are the predicted results:
[[2.72062905e-02 1.47979835e-03 4.44446778e-04 1.60833297e-05
4.15672457e-06 3.20438482e-02 9.38653767e-01 1.41544719e-04
5.55426550e-06 4.47654566e-06]
[2.31099591e-01 1.71699154e-03 1.32052042e-02 4.70457249e-04
8.86382014e-02 2.65704724e-03 6.54215395e-01 7.50611164e-03
4.89178114e-04 1.89376965e-06]
[1.24909900e-01 8.73659015e-01 9.71468398e-06 1.66079029e-04
1.05203628e-06 4.14116839e-05 3.97000113e-05 6.98190925e-05
1.10231712e-03 9.84829512e-07]
The sigmoid allows you to have high probability for all of your classes, some of them, or none of them. Example: classifying diseases in a chest x-ray image. The image might contain pneumonia, emphysema, and/or cancer, or none of those findings.
The softmax enforces that the sum of the probabilities of your output classes are equal to one, so in order to increase the probability of a particular class, your model must correspondingly decrease the probability of at least one of the other classes. Example: classifying images from the MNIST data set of handwritten digits. A single picture of a digit has only one true identity - the picture cannot be a 7 and an 8 at the same time.
So in your case if the model is good the prediction will not differ alot when either using sigmoid or softmax, softmax forces the sum of prediction to be 1 sigmoid doesn't do that.