I downloaded stanford-corenlp-full-2015-12-09. And I created a training model with the following command:
java -mx8g edu.stanford.nlp.sentiment.SentimentTraining -numHid 25 -trainPath train.txt -devPath dev.txt -train -model model.ser.gz
When I finished training, I found many files in my directory. the model list
Then I used the evaluation tool from the package and I ran the code like this:
java -cp * edu.stanford.nlp.sentiment.Evaluate -model model-0024-79.82.ser.gz -treebank test.txt
The test.txt was from trainDevTestTrees_PTB.zip. This is the result about code:
F:\trainDevTestTrees_PTB\trees>java -cp * edu.stanford.nlp.sentiment.Evaluate -model model-0024-79.82.ser.gz -treebank test.txt
EVALUATION SUMMARY
Tested 82600 labels
65331 correct
17269 incorrect
0.790932 accuracy
Tested 2210 roots
890 correct
1320 incorrect
0.402715 accuracy
Label confusion matrix
Guess/Gold 0 1 2 3 4 Marg. (Guess)
0 551 340 87 32 6 1016
1 956 5348 2476 686 191 9657
2 354 2812 51386 3097 467 58116
3 146 744 2525 6804 1885 12104
4 1 11 74 379 1242 1707
Marg. (Gold) 2008 9255 56548 10998 3791
0 prec=0.54232, recall=0.2744, spec=0.99423, f1=0.36442
1 prec=0.5538, recall=0.57785, spec=0.94125, f1=0.56557
2 prec=0.8842, recall=0.90871, spec=0.74167, f1=0.89629
3 prec=0.56213, recall=0.61866, spec=0.92598, f1=0.58904
4 prec=0.72759, recall=0.32762, spec=0.9941, f1=0.4518
Root label confusion matrix
Guess/Gold 0 1 2 3 4 Marg. (Guess)
0 50 60 12 9 3 134
1 161 370 147 94 36 808
2 31 103 102 60 32 328
3 36 97 123 305 265 826
4 1 3 5 42 63 114
Marg. (Gold) 279 633 389 510 399
0 prec=0.37313, recall=0.17921, spec=0.9565, f1=0.24213
1 prec=0.45792, recall=0.58452, spec=0.72226, f1=0.51353
2 prec=0.31098, recall=0.26221, spec=0.87589, f1=0.28452
3 prec=0.36925, recall=0.59804, spec=0.69353, f1=0.45659
4 prec=0.55263, recall=0.15789, spec=0.97184, f1=0.24561
Approximate Negative label accuracy: 0.638817
Approximate Positive label accuracy: 0.697140
Combined approximate label accuracy: 0.671925
Approximate Negative root label accuracy: 0.702851
Approximate Positive root label accuracy: 0.742574
Combined approximate root label accuracy: 0.722680
The accuracy about fine-grained and positive/negative was quite different from the paper "Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y. and Potts, C., 2013, October. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the conference on empirical methods in natural language processing (EMNLP) (Vol. 1631, p. 1642)." The paper states the accuracy about fine-grained and positive/negative is higher than mine. The records in the paper
Were there any problems with my operation? Why was my result different from the paper?
The short answer is that the paper used a different system written in Matlab. The Java system does not match the paper. Though we do distribute the binary model we trained in Matlab with the English models jar. So you can RUN the binary model with Stanford CoreNLP, but you cannot TRAIN a binary model with similar performance with Stanford CoreNLP at this time.