I am trying to figure out how to match activation=sigmoid
and activation=softmax
with the correct model.compile(
) loss parameters. Specifically those associated with binary_crossentropy
.
I have researched related topics and read the docs. Also I have built a model and got it working with sigmoid
but not softmax
. And I cannot get it working properly with the "from_logits
" parameters.
Specifically, here it says:
Args:
from_logits
: Whetheroutput
is expected to be a logits tensor. By default, we consider thatoutput
encodes a probability distribution.
This says to me that if you use a sigmoid
activation you want "from_logits=True
". And for softmax
activation you want "from_logits=False
" by default. Here I am assuming that sigmoid
provides logits
and softmax
provides a probability distribution.
Next is some code:
model = Sequential()
model.add(LSTM(units=128,
input_shape=(n_timesteps, n_features),
return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(units=64, return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(units=32))
model.add(Dropout(0.3))
model.add(Dense(16, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(1, activation='sigmoid'))
Notice the last line is using the sigmoid
activation. Then:
model.compile(optimizer=optimizer,
loss='binary_crossentropy',
metrics=['accuracy'])
This works fine but it is working with the default "from_logits=False" which is expecting a probability distribution.
If I do the following, it fails:
model.compile(optimizer=optimizer,
loss='binary_crossentropy',
metrics=['accuracy'],
from_logits=True) # For 'sigmoid' in above Dense
with this error message:
ValueError: Invalid argument "from_logits" passed to K.function with TensorFlow backend
If I try using the softmax activation as:
model.add(Dense(1, activation='softmax'))
It runs but I get 50% accuracy results. With sigmoid
I am getting +99% accuracy. (I am using a very contrived data set to debug my models and would expect very high accuracy. Plus it is a very small data set and will over fit but that is OK for now.)
So I expect that I should be able to use the "from_logits
" parameter in the compile function. But it does not recognize that parameter.
Also I would like to know why it works with the sigmoid
activation and not the softmax
activation and how do I get it working with the softmax
activation.
Thank you,
Jon.
To use the from_logits
in your loss function, you must pass it into the BinaryCrossentropy object initialization, not in the model compile.
You must change this:
model.compile(optimizer=optimizer,
loss='binary_crossentropy',
metrics=['accuracy'],
from_logits=True)
to this:
model.compile(optimizer=optimizer,
loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy'])
However, if you are using a softmax or sigmoid in the final layer in the network, you do not need from_logits=True
. Softmax and sigmoid output normalized values between [0, 1], which are considered probabilities in this context.
See this question for more information: What is the meaning of the word logits in TensorFlow?
Now to fix your 50% accuracy issue with softmax, change the following code from this:
model.add(Dense(1, activation='softmax'))
to this:
model.add(Dense(2, activation='softmax')) # number of units = number of classes
Remember that when you are using softmax, you are outputting the probability of the example belonging to each class. For this reason, you need a unit for each possible class, which in a binary classification context will be 2 units.