I got CSV and TEXT format results like followings with clusterdump.
CSV:
0,Sports_38.txt
1,Sports_23.txt
2,Sports_36.txt
3,Sports_13.txt
4,Sports_31.txt,Sports_32.txt
5,Sports_28.txt,Sports_29.txt
6,Sports_2.txt
9,Sports_15.txt
TEXT:
{"identifier":"VL-1","r":[],"c":[...,"n":7}
Top Terms:
什 => 15.829998016357422
利物浦 => 13.629814147949219
克 => 11.317766189575195
格 => 10.938775062561035
特 => 10.842317581176758
尔 => 10.447234153747559
切尔西 => 9.742402076721191
比赛 => 8.247735023498535
表现 => 7.909337520599365
批评 => 7.462332725524902
I noticed that just one point of VL-1 in CSV file but 7 points of VL-1 in TEXT file (VL-1's "n" equals 7).
Why did some points disappear? And how can I get every points' cluster?
Thanks a lot.
I also got empty clusteredPoints if the data is a little bigger.
I finally found the reason by myself.
clusterClassificationThreshold should be 0 in Kmeans.run's 8th parameter.(mahout 1.0)