I use LibShortText for short-text classification.
I trained a model and use it to get class predictions on my test set by running:
python text-train.py -L 0 -f ./demo/train_file
python text-predict.py ./demo/train_file train_file.model output
The output
file contains the score of each class for each test sample. She is the beginning of the output
file:
version: 1
analyzable: 1
text-src: ./demo/train_file
extra-files:
model-id: 22d9e6defd38ed92e45662d576262915d10c3374
Tickets Tickets 1.045974012515694 -0.1533289000025808 -0.142460215262256 -0.1530588765291932 -0.1249182478102407 -0.1190708362082807 -0.06841237067728836 0.04587568197139553 -0.2283616562229066 -0.102238591774343
Stamps Stamps -0.1187719176481736 1.118188003417143 -0.08034439513604429 -0.1973997029054026 -0.06355109135595602 -0.1786639939826796 -0.1169254102259164 -0.01967861752032143 -0.06964465109882922 -0.2732082235438185
Music Music -0.1315596826953709 -0.2641082947449856 1.008713836384851 -0.04068831625284784 -0.1545790157496564 -0.1010212095804389 -0.02069378431571431 -0.02404317930606417 0.008960552873498827 -0.2809809066132714
Jewelry & Watches Jewelry & Watches -0.0749032450936907 -0.1369122108940684 -0.2159355702219642 0.9582440549577076 -0.141187218792264 -0.1290355317490395 -0.04287756450848382 -0.0919782002284954 -0.04312539181047169 -0.0822891216592294
Tickets Tickets 0.9291396425612148 -0.1597595507175184 -0.07086077554348413 -0.07087036006347401 -0.1111802245732816 -0.2329161314957608 -0.07080154336497513 -0.07093153970747144 -0.07096098431125453 -0.07085853278399512
Books Books -0.03482279197164031 -0.02622229736755784 -0.08576360644172253 -0.1209545478269265 0.9735039690597804 -0.02640896142537765 -0.1511226188239169 -0.1785299152500055 -0.1569282110333412 -0.1927510189192921
Tickets Tickets 1.165624491239117 -0.1643444003616841 -0.279795018266336 -0.05911033737681937 -0.1496733471948844 -0.1774767469424229 -0.1806900189575362 -0.05711408596057094 0.06427848575613292 -0.1616990219349959
Art Art -0.07563152438778584 -0.1926345255861422 -0.1379519287608234 -0.1728869014895525 -0.2081235484009353 0.9764371359082827 -0.06097998223834129 -0.06082239643658216 -0.0434090642865785 -0.0239972643215402
Art Art -0.21374038053991 0.0146962630542977 -0.02279914632208601 -0.001108284295731699 -0.2621058759589903 1.016592310148241 0.01436347343617804 -0.04476369315079338 -0.1246095742882179 -0.3765250920829869
Books Books -0.08063364674726788 -0.08053738921453879 -0.08032365427931695 -0.1496633152184083 0.9195583554164264 -0.08011940998873018 -0.08053175336913043 -0.16302082274963 -0.1105339242133948 -0.09419443963601073
How can I know to which class each score corresponds to?
I know I could infer it by looking at the predicted class and the maximum score for several test samples, but I'm hoping there exist some mmore direct way.
The labels
member of the PredictionResult
returned from predict_text()
contains the ordering. So a small addition to classifier_impl.py
will expose as column headers in the output file:
*** libshorttext-1.1/libshorttext/classifier/classifier_impl.py.orig
--- libshorttext-1.1/libshorttext/classifier/classifier_impl.py
***************
*** 113,118 ****
--- 113,125 ----
fmt = '\t{{0:{0}}}'.format(fmt)
for i in range(len(self.predicted_y)):
+ if i == 0:
+ label_text = 'Predicted' * 18
+ label_text += 'True class' * 18
+ for l in self.labels:
+ label_text += " {0: <18}".format(l)
+ fout.write(label_text + "\n")
+
fout.write("{py}\t{y}".format(py = self.predicted_y[i], y = self.true_y[i]))
for v in self.decvals[i]:
fout.write(fmt.format(v))