I am trying to classify binary data. In the data file, class [0,1] is converted to [-1,1]. Data has 21 features. All features are categorical. I am using neural network for training. The training command is:
vw -d train.vw --cache_file data --passes 5 -q sd -q ad -q do -q fd --binary -f model --nn 22
I create raw prediction file as:
vw -d test.vw -t -i neuralmodel -r raw.txt
And normal prediction file as:
vw -d test.vw -t -i neuralmodel -p out.txt
First five lines of raw file are:
0:-0.861075,-0.696812 1:-0.841357,-0.686527 2:0.796014,0.661809 3:1.06953,0.789289 4:-1.23823,-0.844951 5:0.886767,0.709793 6:2.02206,0.965555 7:-2.40753,-0.983917 8:-1.09056,-0.797075 9:1.22141,0.84007 10:2.69466,0.990912 11:2.64134,0.989894 12:-2.33309,-0.981359 13:-1.61462,-0.923839 14:1.54888,0.913601 15:3.26275,0.995055 16:2.17991,0.974762 17:0.750114,0.635229 18:2.91698,0.994164 19:1.15909,0.820746 20:-0.485593,-0.450708 21:2.00432,0.964333 -0.496912
0:-1.36519,-0.877588 1:-2.83699,-0.993155 2:-0.257558,-0.251996 3:-2.12969,-0.97213 4:-2.29878,-0.980048 5:2.70791,0.991148 6:1.31337,0.865131 7:-2.00127,-0.964116 8:-2.14167,-0.972782 9:2.50633,0.986782 10:-1.09253,-0.797788 11:2.29477,0.97989 12:-1.67385,-0.932057 13:-0.740598,-0.629493 14:0.829695,0.680313 15:3.31954,0.995055 16:3.44069,0.995055 17:2.48612,0.986241 18:1.32241,0.867388 19:1.97189,0.961987 20:1.19584,0.832381 21:1.65151,0.929067 -0.588528
0:0.908454,0.72039 1:-2.48134,-0.986108 2:-0.557337,-0.505996 3:-2.15072,-0.973263 4:-1.77706,-0.944375 5:0.202272,0.199557 6:2.37479,0.982839 7:-1.97478,-0.962201 8:-1.78124,-0.944825 9:1.94016,0.959547 10:-1.67845,-0.932657 11:2.54895,0.987855 12:-1.60502,-0.92242 13:-2.32369,-0.981008 14:1.59895,0.921511 15:2.02658,0.96586 16:2.55443,0.987987 17:3.47049,0.995055 18:1.92482,0.958313 19:1.47773,0.901044 20:-3.60913,-0.995055 21:3.56413,0.995055 -0.809399
0:-2.11677,-0.971411 1:-1.32759,-0.868656 2:2.59003,0.988807 3:-0.198721,-0.196146 4:-2.51631,-0.987041 5:0.258549,0.252956 6:1.60134,0.921871 7:-2.28731,-0.97959 8:-2.89953,-0.993958 9:-0.0972349,-0.0969177 10:3.1409,0.995055 11:1.62083,0.924746 12:-2.30097,-0.980134 13:-2.05674,-0.967824 14:1.6744,0.932135 15:1.85612,0.952319 16:2.7231,0.991412 17:1.97199,0.961995 18:3.47125,0.995055 19:0.603527,0.539567 20:1.25539,0.84979 21:2.15267,0.973368 -0.494474
0:-2.21583,-0.97649 1:-2.16823,-0.974171 2:2.00711,0.964528 3:-1.84079,-0.95087 4:-1.27159,-0.854227 5:-0.0841799,-0.0839635 6:2.24566,0.977836 7:-2.19458,-0.975482 8:-2.42779,-0.98455 9:0.39883,0.378965 10:1.32133,0.86712 11:1.87572,0.95411 12:-2.22585,-0.976951 13:-2.04512,-0.96708 14:1.52652,0.909827 15:1.98228,0.962755 16:2.37265,0.982766 17:1.73726,0.939908 18:2.315,0.980679 19:-0.08135,-0.081154 20:1.39248,0.883717 21:1.5889,0.919981 -0.389856
First five lines of (normal) prediction file are:
-0.496912
-0.588528
-0.809399
-0.494474
-0.389856
I have tallied this (normal) output with raw output. I notice that the (last or) ending float value in each of the five raw lines is the same as above.
I would please like to understand the raw output as also the normal output. That each line holds 22 pairs of values is something to do with 22 neurons? How to interpret the output as [-1,1] and why a sigmoid function is needed to convert either of the above to probabilities. Will be grateful for help.
For binary classification, you should use a suitable loss function (--loss_function=logistic
or --loss_function=hinge
). The --binary
switch just makes sure that the reported loss is the 0/1 loss (but you cannot optimize for 0/1 loss directly, the default loss function is --loss_function=squared
).
I recommend trying the --nn
as one of the last steps when tuning the VW parameters. Usually, it improves the results only a little bit and the optimal number of units in the hidden layer is quite small (--nn 1
, --nn 2
or --nn 3
). You can also try adding a direct connections between the input and output layer with --inpass
.
Note that --nn
uses always tanh as the sigmoid function for the hidden layer and only one hidden layer is possible (it is hardcoded in nn.cc).
If you want to get probabilities (real number from [0,1]), use vw -d test.vw -t -i neuralmodel --link=logistic -p probabilities.txt
. If you want the output to a be real number from [-1,1], use --link=glf1
.
Without --link
and --binary
, the --pred
output are the internal predictions (in range [-50, 50] when logistic or hinge loss function is used).
As for the --nn --raw
question, your guess is correct:
The 22 pairs of numbers correspond to the 22 neurons and the last number is the final (internal) prediction. My guess is that each pair corresponds to the bias and output of each unit on the hidden layer.