python tensorflow keras loss-function activation-function

How To specify model.compile for binary_crossentropy, activation=sigmoid and activation=softmax?

I am trying to figure out how to match activation=sigmoid and activation=softmax with the correct model.compile() loss parameters. Specifically those associated with binary_crossentropy.

I have researched related topics and read the docs. Also I have built a model and got it working with sigmoid but not softmax. And I cannot get it working properly with the "from_logits" parameters.

Specifically, here it says:

Args:

from_logits: Whether output is expected to be a logits tensor. By default, we consider that output encodes a probability distribution.

This says to me that if you use a sigmoid activation you want "from_logits=True". And for softmax activation you want "from_logits=False" by default. Here I am assuming that sigmoid provides logits and softmax provides a probability distribution.

Next is some code:

model = Sequential()
model.add(LSTM(units=128,
               input_shape=(n_timesteps, n_features), 
               return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(units=64, return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(units=32))
model.add(Dropout(0.3))
model.add(Dense(16, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(1, activation='sigmoid'))

Notice the last line is using the sigmoid activation. Then:

model.compile(optimizer=optimizer,
              loss='binary_crossentropy',  
              metrics=['accuracy'])

This works fine but it is working with the default "from_logits=False" which is expecting a probability distribution.

If I do the following, it fails:

model.compile(optimizer=optimizer,
              loss='binary_crossentropy',  
              metrics=['accuracy'],
              from_logits=True) # For 'sigmoid' in above Dense

with this error message:

ValueError: Invalid argument "from_logits" passed to K.function with TensorFlow backend

If I try using the softmax activation as:

model.add(Dense(1, activation='softmax'))

It runs but I get 50% accuracy results. With sigmoid I am getting +99% accuracy. (I am using a very contrived data set to debug my models and would expect very high accuracy. Plus it is a very small data set and will over fit but that is OK for now.)

So I expect that I should be able to use the "from_logits" parameter in the compile function. But it does not recognize that parameter.

Also I would like to know why it works with the sigmoid activation and not the softmax activation and how do I get it working with the softmax activation.

Thank you,

Jon.

Solution

To use the from_logits in your loss function, you must pass it into the BinaryCrossentropy object initialization, not in the model compile.

You must change this:

model.compile(optimizer=optimizer,
              loss='binary_crossentropy',  
              metrics=['accuracy'],
              from_logits=True)

to this:

model.compile(optimizer=optimizer,
              loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),  
              metrics=['accuracy'])

However, if you are using a softmax or sigmoid in the final layer in the network, you do not need from_logits=True. Softmax and sigmoid output normalized values between [0, 1], which are considered probabilities in this context.

See this question for more information: What is the meaning of the word logits in TensorFlow?

Now to fix your 50% accuracy issue with softmax, change the following code from this:

model.add(Dense(1, activation='softmax'))

to this:

model.add(Dense(2, activation='softmax'))  # number of units = number of classes

Remember that when you are using softmax, you are outputting the probability of the example belonging to each class. For this reason, you need a unit for each possible class, which in a binary classification context will be 2 units.