java machine-learning data-science deeplearning4j

Deeplearning4j - 3 layer neural network not fitting correctly

I'm in the process of trying to learn the Deeplearning4j library. I'm trying to implement a simple 3-layer neural network using sigmoid activation functions to solve XOR. What configurations or hyper-parameters am I missing? I've managed to get accurate outputs using RELU activations with a softmax output from some of the MLP examples I found online, however with sigmoid activations it doesn't seem to want to fit accurately. Can anyone share why my network isn't producing the correct outputs?

    DenseLayer inputLayer = new DenseLayer.Builder()
            .nIn(2)
            .nOut(3)
            .name("Input")
            .weightInit(WeightInit.ZERO)
            .build();

    DenseLayer hiddenLayer = new DenseLayer.Builder()
            .nIn(3)
            .nOut(3)
            .name("Hidden")
            .activation(Activation.SIGMOID)
            .weightInit(WeightInit.ZERO)
            .build();

    OutputLayer outputLayer = new OutputLayer.Builder()
            .nIn(3)
            .nOut(1)
            .name("Output")
            .activation(Activation.SIGMOID)
            .weightInit(WeightInit.ZERO)
            .lossFunction(LossFunction.MEAN_SQUARED_LOGARITHMIC_ERROR)
            .build();

    NeuralNetConfiguration.Builder nncBuilder = new NeuralNetConfiguration.Builder();
    nncBuilder.iterations(10000);
    nncBuilder.learningRate(0.01);
    nncBuilder.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT);

    NeuralNetConfiguration.ListBuilder listBuilder = nncBuilder.list();
    listBuilder.layer(0, inputLayer);
    listBuilder.layer(1, hiddenLayer);
    listBuilder.layer(2, outputLayer);

    listBuilder.backprop(true);

    MultiLayerNetwork myNetwork = new MultiLayerNetwork(listBuilder.build());
    myNetwork.init();

    INDArray trainingInputs = Nd4j.zeros(4, inputLayer.getNIn());
    INDArray trainingOutputs = Nd4j.zeros(4, outputLayer.getNOut());

    // If 0,0 show 0
    trainingInputs.putScalar(new int[]{0,0}, 0);
    trainingInputs.putScalar(new int[]{0,1}, 0);
    trainingOutputs.putScalar(new int[]{0,0}, 0);

    // If 0,1 show 1
    trainingInputs.putScalar(new int[]{1,0}, 0);
    trainingInputs.putScalar(new int[]{1,1}, 1);
    trainingOutputs.putScalar(new int[]{1,0}, 1);

    // If 1,0 show 1
    trainingInputs.putScalar(new int[]{2,0}, 1);
    trainingInputs.putScalar(new int[]{2,1}, 0);
    trainingOutputs.putScalar(new int[]{2,0}, 1);

    // If 1,1 show 0
    trainingInputs.putScalar(new int[]{3,0}, 1);
    trainingInputs.putScalar(new int[]{3,1}, 1);
    trainingOutputs.putScalar(new int[]{3,0}, 0);

    DataSet myData = new DataSet(trainingInputs, trainingOutputs);
    myNetwork.fit(myData);


    INDArray actualInput = Nd4j.zeros(1,2);
    actualInput.putScalar(new int[]{0,0}, 0);
    actualInput.putScalar(new int[]{0,1}, 0);

    INDArray actualOutput = myNetwork.output(actualInput);
    System.out.println("myNetwork Output " + actualOutput);
    //Output is producing 1.00. Should be 0.0

Solution

So in general, I am going to link you to: https://deeplearning4j.org/troubleshootingneuralnets

A few concrete tips. Never use weight init zero, there is a reason we don't use that in our examples (which I highly suggest you start from rather than doing it from scratch): https://github.com/deeplearning4j/dl4j-examples

For the output layer, why not just use binary xent if you're trying to learn xor: https://github.com/deeplearning4j/dl4j-examples/blob/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/feedforward/xor/XorExample.java

Of note here, turn off mini batch too (see example above), see: https://deeplearning4j.org/toyproblems