I start cluster my data using open source code by using java + weka lib it run correctly when the format of the dataset .arff but I want to use the dataset of movielens (to cluster the user using their demographic information ) the file name is "u.user" you can find the file dicription here http://files.grouplens.org/datasets/movielens/ml-100k-README.txt
and this my code
import weka.clusterers.SimpleKMeans;
import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import java.io.IOException;
public class Clustering {
public static void main(String args[]) throws Exception{
//load dataset
String dataset = "C:/Users/DELL/Desktop/work/u.user";
DataSource source = new DataSource(dataset);
//get instances object
Instances data = source.getDataSet();
// new instance of clusterer
SimpleKMeans model = new SimpleKMeans();//Simple EM (expectation maximisation)
//number of clusters
model.setNumClusters(4);
//set distance function
//model.setDistanceFunction(new weka.core.ManhattanDistance());
// build the clusterer
model.buildClusterer(data);
System.out.println(model);
}
}
after the run this error display
Exception in thread "main" java.io.IOException: File not found : C:\Users\DELL\Desktop\work\u.names
weka.core.converters.C45Loader.setSource(C45Loader.java:190)
weka.core.converters.AbstractFileLoader.setFile(AbstractFileLoader.java:90)
weka.core.converters.ConverterUtils$DataSource.reset(ConverterUtils.java:306)
weka.core.converters.ConverterUtils$DataSource.<init>(ConverterUtils.java:141)
Clustering.main(Clustering.java:24)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
at weka.core.converters.C45Loader.setSource(C45Loader.java:190)
at weka.core.converters.AbstractFileLoader.setFile(AbstractFileLoader.java:90)
at weka.core.converters.ConverterUtils$DataSource.reset(ConverterUtils.java:306)
at weka.core.converters.ConverterUtils$DataSource.<init>(ConverterUtils.java:141)
at Clustering.main(Clustering.java:24)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Process finished with exit code 1
I am sure it because the extention of the file , beacause when I use other file with extention.arff it work can you help me how to cluster my data
You also need to pay attention on the file format (not only the extension). Convert the dataset format to match Weka ARFF format. In case of your data u.user
, you need to change the extension to *.arff (eg. user.arff
) and the format to something like:
@RELATION user
@ATTRIBUTE id INTEGER % this is actually useless
@ATTRIBUTE age INTEGER
@ATTRIBUTE gender {M,F}
@ATTRIBUTE occupation {administrator,artist,doctor,educator,engineer,entertainment,executive,healthcare,homemaker,lawyer,librarian,marketing,none,other,programmer,retired,salesman,scientist,student,technician,writer} % from u.occupation
@ATTRIBUTE zipcode STRING
@DATA
1,24,M,technician,85711
2,53,F,other,94043
3,23,M,writer,32067
4,24,M,technician,43537
5,33,F,other,15213
6,42,M,executive,98101
7,57,M,administrator,91344
8,36,M,administrator,05201
...
You should be able to parse the dataset into a weka.core.Instances
. But, unfortunately, SimpleKMeans
will reject your data with:
weka.core.UnsupportedAttributeTypeException: weka.clusterers.SimpleKMeans: Cannot handle string attributes!
So you are left with (at least) 3 options:
id
)weka.clusterers.HierarchicalClusterer
Good luck!