In theory, an MLP with a single hidden layer with just 3 neurons is enough to predict xor correctly. It could sometimes fail to converge properly, but 4 neurons are a safe bet.
Here's an example
I've tried to reproduce this using sklearn.neural_network.MLPClassifier:
from sklearn import neural_network
from sklearn.metrics import accuracy_score, precision_score, recall_score
import numpy as np
x_train = np.random.uniform(-1, 1, (10000, 2))
tmp = x_train > 0
y_train = 2 * (tmp[:, 0] ^ tmp[:, 1]) - 1
model = neural_network.MLPClassifier(
hidden_layer_sizes=(3,), n_iter_no_change=100,
learning_rate_init=0.01, max_iter=1000
).fit(x_train, y_train)
x_test = np.random.uniform(-1, 1, (1000, 2))
tmp = x_test > 0
y_test = 2 * (tmp[:, 0] ^ tmp[:, 1]) - 1
prediction = model.predict(x_test)
print(f'Accuracy: {accuracy_score(y_pred=prediction, y_true=y_test)}')
print(f'recall: {recall_score(y_pred=prediction, y_true=y_test)}')
print(f'precision: {precision_score(y_pred=prediction, y_true=y_test)}')
I only get around 0.75 accuracy, while the tensorflow playground model is perfect, any idea what makes the difference?
Tried also using tensorflow:
model = tf.keras.Sequential(layers=[
tf.keras.layers.Input(shape=(2,)),
tf.keras.layers.Dense(4, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(loss=tf.keras.losses.binary_crossentropy)
x_train = np.random.uniform(-1, 1, (10000, 2))
tmp = x_train > 0
y_train = (tmp[:, 0] ^ tmp[:, 1])
model.fit(x=x_train, y=y_train)
x_test = np.random.uniform(-1, 1, (1000, 2))
tmp = x_test > 0
y_test = (tmp[:, 0] ^ tmp[:, 1])
prediction = model.predict(x_test) > 0.5
print(f'Accuracy: {accuracy_score(y_pred=prediction, y_true=y_test)}')
print(f'recall: {recall_score(y_pred=prediction, y_true=y_test)}')
print(f'precision: {precision_score(y_pred=prediction, y_true=y_test)}')
With this model I get similar results to the scikit-learn model... So it's not just a scikit-learn issue - am I missing some important hyper-parameter?
Edit
Ok, changed the loss to mean squared error instead of cross-entropy, and now I get with the tensorflow example 0.92 accuracy. I guess that's the problem with the MLPClassifier?
Increasing the learning rate and/or maximum iterations seems to make the sklearn version work. Probably different solvers need different values for these, and it's not clear to me what the tf playground is using.