python pandas machine-learning classification pybrain

pybrain - ClassificationDataSet - how to understand the output when using SoftmaxLayer

I am trying to build first classifier with Pybrain neural network along with specialized ClassificationDataSet and I am not sure I fully understand it works.

So I have a pandas dataframe with 6 feature columns and 1 column for class label (Survived, just 0 or 1).

I build a dataset out of it:

ds = ClassificationDataSet(6, 1, nb_classes=2)
for i in df[['Gender', 'Pclass', 'AgeFill', 'FamilySize', 'FarePerPerson', 'Deck','Survived']].values:
    ds.addSample(tuple(i[:-1]), i[-1])
ds._convertToOneOfMany()
return ds

Ok, I check how dataset looks like:

for i, m in ds:
    i, m


(array([ 1.,  3.,  2.,  2.,  1.,  8.]), array([1, 0]))
(array([ 0.,  1.,  1.,  2.,  0.,  2.]), array([0, 1]))

And I already have a problem. What means [1,0] or [0,1]? Is it just '0' or '1' of original 'survived' column? How to get back to original values?

Later, when I finish with training of my network:

net = buildNetwork(6, 6, 2, hiddenclass=TanhLayer, bias=True,  outclass=SoftmaxLayer)
trainer = BackpropTrainer(net, ds)
trainer.trainEpochs(10)

I will try to activate it on my another dataset (for which I want to do actual classification) and I will get a pairs of activation results for each of 2 output neurons, but how to understand which output neuron corresponds to which original class? Probably this is something obvious, but I am not able to understand it from the docs, unfortunately.

Solution

Ok, looks like pybrain uses position to determine which class it means by (0,1) or (1,0).

To go back to original 0 or 1 mark you need to use argmax() function. So for example if I already have a trained network and I want to validate it on the same data as I used for training I could do this:

for inProp, num in ds:
    out = net.activate(inProp).argmax()
    if out == num.argmax():
        true+=1
    total+=1
res = true/total

inProp will look like a tuple of my input values for activation, num - a tuple of expected two-neuron output (either (0,1) or (1,0)) and num.argmax() will translate it into just 0 or 1 - real output.

I might be wrong since this is a pure heuristic, but it works in my example.