python machine-learning scikit-learn mlp

How to train a model for XOR using scikit-learn?

Is there a magic sequence of parameters to allow the model to infer correctly from the data it hasn't seen before?

from sklearn.neural_network import MLPClassifier

clf = MLPClassifier(
                activation='logistic',
                max_iter=100,
                hidden_layer_sizes=(2,),
                solver='lbfgs')
X = [[ 0,  0],  # 2 samples, 3 features
     [0, 1],
#      [1, 0],
    [1, 1]]
y = [0, 
     1,
#      1,
     0]  # classes of each sample
clf.fit(X, y)

assert clf.predict([[0, 1]]) == [1]
assert clf.predict([[1, 0]]) == [1]

Solution

How about to use kernel? Kernel is a way of a model to to extract the desirable features from data.

Generally used kernels may not satisfy your requirement. I believe they try to find 'cut' hyperplane between one hyperplane which contains [0, 0] and [1, 1] and another hyperplane which contains [0, 1].

In 2-dimensional space, for example, one hyperplane is y = x and another hyperplane is y = x + 1. Then 'cut' hyperplane could be y = x + 1/2.

So I suggest the following kernel.

def kernel(X1, X2):
    X1 = np.array([[(x[0] - x[1]) ** 2] for x in X1])
    X2 = np.array([[(x[0] - x[1]) ** 2] for x in X2])
    return np.dot(X1, X2.T)

What this kernel does is this. It squares the different between two scalars; (x - y)². With this way of feature extraction, data will be featurized like the following:

[0, 0] → [0]
[0, 1] → [1]
[1, 1] → [0]

And also for the unseen datum:

[1, 0] → [1]

So the following trained classifier will predicts as you desire; ([1, 0] → [1]).

clf = svm.SVC(kernel=kernel, max_iter=100)

Model selection is very important in machine learning. A model which does not know that [0, 0] and [1, 1] are in the same group and [0, 1] and [1, 0] are in the other same group may not make the prediction you expect.