python tensorflow keras logistic-regression

Tensorflow 2.0 - do these model predictions represent probabilities?

I have a very simple Tensorflow 2 Keras model to do penalized logistic regression on some data. I was hoping to get the probabilties of each class, instead of just the predicted values of [0 or 1].

I think I got what I wanted, but just wanted to make sure that these numbers are what I think they are. I used the model.predict_on_batch() function from Tensorflow.keras, but the documentation just says that this provides a numpy array of predictions. However I believe I am getting probabilities, but I was hoping someone could confirm.

The model code looks like this:

feature_layer = tf.keras.layers.DenseFeatures(features)                                                                    

model = tf.keras.Sequential([                                                                                              
    feature_layer,                                                                                                         
    layers.Dense(1, activation='sigmoid', kernel_regularizer=tf.keras.regularizers.l1(0.01))                               
])                                                                                                                         
model.compile(optimizer='adam',                                                                                            
              loss='binary_crossentropy',                                                                                  
              metrics=['accuracy'])      

predictions = model.predict_on_batch(validation_dataset)

print('Predictions for a single batch.')
print(predictions)

So the predictions I am getting look like:

Predictions for a single batch.                                                                             
tf.Tensor(                                                                                       
[[0.10916319]                                                                                                           
 [0.14546806]                                                               
 [0.13057315]                                                                                                             
 [0.11713684]                                                                                    
 [0.16197902]                                                                                                              
 [0.19613355]                                                                                                             
 [0.1388464 ]                                                                                                   
 [0.14122346]
 [0.26149303]
 [0.12516734]
 [0.1388464 ]
 [0.14595506]
 [0.14595506]]

Now for predictions in a logistic regression that would be an array of either 0 or 1. But since I am getting floating point values. However, I am just getting a single value when there is actually a probability that the example is a 0 and the probability that the example is a 1. So I would imagine an array of 2 probabilities for each row or example. Of course, the Probability(Y = 0) + Probability(Y = 1) = 1, so this might just be some concise representation.

So again, do the values in the array below represent probabilities that the example or Y = 1, or something else?

Solution

The values represented here:

tf.Tensor(                                                                                       
[[0.10916319]                                                                                                           
 [0.14546806]                                                               
 [0.13057315]                                                                                                             
 [0.11713684]                                                                                    
 [0.16197902]                                                                                                              
 [0.19613355]                                                                                                             
 [0.1388464 ]                                                                                                   
 [0.14122346]
 [0.26149303]
 [0.12516734]
 [0.1388464 ]
 [0.14595506]
 [0.14595506]]

Are the probabilities corresponding to each one of your classes.
Since you used sigmoid activation on your last layer, these will be in the range [0, 1].
Your model is very shallow (few layers) and thus these prediction probabilities are very close between classes. I suggest you add more layers.

Conclusion

To answer your question, these are probabilities but only due to your activation function selection (sigmoid). If you used tanh activation these would be in range [-1,1].

Note that these probabilities are "binary" for each class due to the use of binary_crossentropy loss - aka 10.92% that class 1 is present and 89.08% that it is not, and so on for other classes. If you want the predictions to follow probabilistic rules (sum = 1) then you should consider categorical_crossentropy.