DL4J Prediction Formatting

I have two questions on deeplearning4j that are somewhat related.

When I execute “INDArray predicted = model.output(features,false);” to generate a prediction, I get the label predicted by the model; it is either 0 or 1. I tried to search for a way to have a probability (value between 0 and 1) instead of strictly 0 or 1. This is useful when you need to set a threshold for what your model should consider as a 0 and what it should consider as a 1. For example, you may want your model to output '1' for any prediction that is higher than or equal to 0.9 and output '0' otherwise.
My second question is that I am not sure why the output is represented as a two-dimensional array (shown after the code below) even though there are only two possibilities, so it would be better to represent it with one value - especially if we want it as a probability (question #1) which is one value.
PS: in case relevant to the question, in the Schema the output column is defined using ".addColumnInteger". Below are snippets of the code used.

Part of the code:

MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
            .seed(seed)
            .iterations(1)
            .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
            .learningRate(learningRate)
            .updater(org.deeplearning4j.nn.conf.Updater.NESTEROVS).momentum(0.9)
            .list()
            .layer(0, new DenseLayer.Builder()
                    .nIn(numInputs)
                    .nOut(numHiddenNodes)
                    .weightInit(WeightInit.XAVIER)
                    .activation("relu")
                    .build())
            .layer(1, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                    .weightInit(WeightInit.XAVIER)
                    .activation("softmax")
                    .weightInit(WeightInit.XAVIER)
                    .nIn(numHiddenNodes)
                    .nOut(numOutputs)
                    .build()
            )
    .pretrain(false).backprop(true).build();

    MultiLayerNetwork model = new MultiLayerNetwork(conf);
    model.init();
    model.setListeners(new ScoreIterationListener(10));

    for (int n=0; n<nEpochs; n++) {
        model.fit(trainIter);
    }

    Evaluation eval = new Evaluation(numOutputs);
    while (testIter.hasNext()){
        DataSet t = testIter.next();
        INDArray features = t.getFeatureMatrix();
        System.out.println("Input features: " + features);
        INDArray labels = t.getLabels();
        INDArray predicted = model.output(features,false);
        System.out.println("Predicted output: "+ predicted);
        System.out.println("Desired output: "+ labels);
        eval.eval(labels, predicted);
        System.out.println();
    }
    System.out.println(eval.stats());

Output from running the code above:

Input features: [0.10, 0.34, 1.00, 0.00, 1.00]
Predicted output: [1.00, 0.00]
Desired output: [1.00, 0.00]

*What I want the output to look like (i.e. a one-value probability):**

Input features: [0.10, 0.34, 1.00, 0.00, 1.00]
Predicted output: 0.14
Desired output: 0.0

Solution

I will answer your questions inline but I just want to note: I would suggest taking a look at our docs and examples: https://github.com/deeplearning4j/dl4j-examples http://deeplearning4j.org/quickstart

A 100% 0 or 1 is just a badly tuned neural net. That's not at all how things work. A softmax by default returns probabilities. Your neural net is just badly tuned. Look at updating dl4j too. I'm not sure what version you're on but we haven't used strings in activations for at least a year now? You seem to have skipped a lot of steps when starting with us. I'll reiterate again, at least take a look above for a starting point rather than using year old code.
What you're seeing there is just standard deep learning 101. So the advice I'm about to give you can be found on the internet and is applicable for any deep learning software. A two label softmax sums each row to 1. If you want 1 label, use sigmoid with 1 output and a different loss function. We use softmax because it can work for any number of ouputs and all you have to do is change the number of outputs rather than having to change the loss function and activation function on top of that.