Search code examples
pythonmachine-learningscikit-learnmlp

How to train a model for XOR using scikit-learn?


Is there a magic sequence of parameters to allow the model to infer correctly from the data it hasn't seen before?

from sklearn.neural_network import MLPClassifier

clf = MLPClassifier(
                activation='logistic',
                max_iter=100,
                hidden_layer_sizes=(2,),
                solver='lbfgs')
X = [[ 0,  0],  # 2 samples, 3 features
     [0, 1],
#      [1, 0],
    [1, 1]]
y = [0, 
     1,
#      1,
     0]  # classes of each sample
clf.fit(X, y)

assert clf.predict([[0, 1]]) == [1]
assert clf.predict([[1, 0]]) == [1]

Solution

  • How about to use kernel? Kernel is a way of a model to to extract the desirable features from data.

    Generally used kernels may not satisfy your requirement. I believe they try to find 'cut' hyperplane between one hyperplane which contains [0, 0] and [1, 1] and another hyperplane which contains [0, 1].

    In 2-dimensional space, for example, one hyperplane is y = x and another hyperplane is y = x + 1. Then 'cut' hyperplane could be y = x + 1/2.

    So I suggest the following kernel.

    def kernel(X1, X2):
        X1 = np.array([[(x[0] - x[1]) ** 2] for x in X1])
        X2 = np.array([[(x[0] - x[1]) ** 2] for x in X2])
        return np.dot(X1, X2.T)
    

    What this kernel does is this. It squares the different between two scalars; (x - y)2. With this way of feature extraction, data will be featurized like the following:

    • [0, 0][0]
    • [0, 1][1]
    • [1, 1][0]

    And also for the unseen datum:

    • [1, 0][1]

    So the following trained classifier will predicts as you desire; ([1, 0][1]).

    clf = svm.SVC(kernel=kernel, max_iter=100)
    

    Model selection is very important in machine learning. A model which does not know that [0, 0] and [1, 1] are in the same group and [0, 1] and [1, 0] are in the other same group may not make the prediction you expect.