Search code examples
tensorflowmachine-learningscikit-learnlogistic-regressionsoftmax

Softmax logistic regression: Different performance by scikit-learn and TensorFlow


I'm trying to learn a simple linear softmax model on some data. The LogisticRegression in scikit-learn seems to work fine, and now I am trying to port the code to TensorFlow, but I'm not getting the same performance, but quite a bit worse. I understand that the results will not be exactly equal (scikit learn has regularization params etc), but it's too far off.

total = pd.read_feather('testfile.feather')

labels = total['labels']
features = total[['f1', 'f2']]

print(labels.shape)
print(features.shape)

classifier = linear_model.LogisticRegression(C=1e5, solver='newton-cg', multi_class='multinomial')
classifier.fit(features, labels)
pred_labels = classifier.predict(features)

print("SCI-KITLEARN RESULTS: ")
print('\tAccuracy:', classifier.score(features, labels)) 
print('\tPrecision:', precision_score(labels, pred_labels, average='macro'))
print('\tRecall:', recall_score(labels, pred_labels, average='macro'))
print('\tF1:', f1_score(labels, pred_labels, average='macro'))

# now try softmax regression with tensorflow 
print("\n\nTENSORFLOW RESULTS: ")

## By default, the OneHotEncoder class will return a more efficient sparse encoding. 
## This may not be suitable for some applications, such as use with the Keras deep learning library. 
## In this case, we disabled the sparse return type by setting the sparse=False argument.
enc = OneHotEncoder(sparse=False)
enc.fit(labels.values.reshape(len(labels), 1)) # Reshape is required as Encoder expect 2D data as input
labels_one_hot = enc.transform(labels.values.reshape(len(labels), 1))

# tf Graph Input
x = tf.placeholder(tf.float32, [None, 2]) # 2 input features
y = tf.placeholder(tf.float32, [None, 5]) # 5 output classes

# Set model weights
W = tf.Variable(tf.zeros([2, 5]))
b = tf.Variable(tf.zeros([5]))

# Construct model
pred = tf.nn.softmax(tf.matmul(x, W) + b) # Softmax

clas = tf.argmax(pred, axis=1)

# Minimize error using cross entropy
cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))
# Gradient Descent
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(cost)

# Initialize the variables (i.e. assign their default value)
init = tf.global_variables_initializer()

# Start training
with tf.Session() as sess:

    # Run the initializer
    sess.run(init)

    # Training cycle
    for epoch in range(1000):
          # Run optimization op (backprop) and cost op (to get loss value)
        _, c = sess.run([optimizer, cost], feed_dict={x: features, y: labels_one_hot})

    # Test model
    correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    class_out = clas.eval({x: features})

    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    print("\tAccuracy:", accuracy.eval({x: features, y: labels_one_hot}))
    print('\tPrecision:', precision_score(labels, class_out, average='macro'))
    print('\tRecall:', recall_score(labels, class_out, average='macro'))
    print('\tF1:', f1_score(labels, class_out, average='macro'))

The output of this code is

(1681,)
(1681, 2)
SCI-KITLEARN RESULTS:
    Accuracy: 0.822129684711
    Precision: 0.837883361162
    Recall: 0.784522522208
    F1: 0.806251963817


TENSORFLOW RESULTS:
    Accuracy: 0.694825
    Precision: 0.735883666192
    Recall: 0.649145125846
    F1: 0.678045562185

I inspected the result of the one-hot-encoding, and the data, but I have no idea why the result in TF is much worse.

Any suggestion would be really appreciated..


Solution

  • The problem turned out to be silly, I just needed more epochs, a smaller learning rate (and for efficiency I turned to AdamOptimizer, results are now equal, although the TF implementation is much slower.

    (1681,)
    (1681, 2)
    SCI-KITLEARN RESULTS:
        Accuracy: 0.822129684711
        Precision: 0.837883361162
        Recall: 0.784522522208
        F1: 0.806251963817
    
    TENSORFLOW RESULTS:
        Accuracy: 0.82213
        Precision: 0.837883361162
        Recall: 0.784522522208
        F1: 0.806251963817