Search code examples
pythontensorflowmachine-learningscikit-learnneural-network

Why can't sklearn MLPClassifier predict xor?


In theory, an MLP with a single hidden layer with just 3 neurons is enough to predict xor correctly. It could sometimes fail to converge properly, but 4 neurons are a safe bet.

Here's an example

I've tried to reproduce this using sklearn.neural_network.MLPClassifier:

from sklearn import neural_network
from sklearn.metrics import accuracy_score, precision_score, recall_score
import numpy as np


x_train = np.random.uniform(-1, 1, (10000, 2))
tmp = x_train > 0
y_train = 2 * (tmp[:, 0] ^ tmp[:, 1]) - 1

model = neural_network.MLPClassifier(
    hidden_layer_sizes=(3,), n_iter_no_change=100,
    learning_rate_init=0.01, max_iter=1000
).fit(x_train, y_train)

x_test = np.random.uniform(-1, 1, (1000, 2))
tmp = x_test > 0
y_test = 2 * (tmp[:, 0] ^ tmp[:, 1]) - 1

prediction = model.predict(x_test)
print(f'Accuracy: {accuracy_score(y_pred=prediction, y_true=y_test)}')
print(f'recall: {recall_score(y_pred=prediction, y_true=y_test)}')
print(f'precision: {precision_score(y_pred=prediction, y_true=y_test)}')

I only get around 0.75 accuracy, while the tensorflow playground model is perfect, any idea what makes the difference?

Tried also using tensorflow:

model = tf.keras.Sequential(layers=[
    tf.keras.layers.Input(shape=(2,)),
    tf.keras.layers.Dense(4, activation='relu'),
    tf.keras.layers.Dense(1)
])

model.compile(loss=tf.keras.losses.binary_crossentropy)

x_train = np.random.uniform(-1, 1, (10000, 2))
tmp = x_train > 0
y_train = (tmp[:, 0] ^ tmp[:, 1])

model.fit(x=x_train, y=y_train)

x_test = np.random.uniform(-1, 1, (1000, 2))
tmp = x_test > 0
y_test = (tmp[:, 0] ^ tmp[:, 1])

prediction = model.predict(x_test) > 0.5
print(f'Accuracy: {accuracy_score(y_pred=prediction, y_true=y_test)}')
print(f'recall: {recall_score(y_pred=prediction, y_true=y_test)}')
print(f'precision: {precision_score(y_pred=prediction, y_true=y_test)}')

With this model I get similar results to the scikit-learn model... So it's not just a scikit-learn issue - am I missing some important hyper-parameter?

Edit

Ok, changed the loss to mean squared error instead of cross-entropy, and now I get with the tensorflow example 0.92 accuracy. I guess that's the problem with the MLPClassifier?


Solution

  • Increasing the learning rate and/or maximum iterations seems to make the sklearn version work. Probably different solvers need different values for these, and it's not clear to me what the tf playground is using.