The data set I used to build the classifier have two attributes only, the first is string comment the second is nominal which is the class, but the dataset is too big I don't want to load it in the server so I would like to use the model to classify new instances without loading the dataset. So here for example let say I created a new instance from some user comment
String usercomment;
Instance instance = new Instance(2);
instance.setValue(0, usercomment);
instance.setMissing(1);
I know I have to set the dataset for the instance but I don't want to load so how can I create a dummy dataset with similar attributes for the instance? Also I am using an old weka library I need to use fastvectors I think.
You do not need to load data set for classifying new instances. You can train your model on data set, then save this model. Later, you load this model to classify new instances. See Saving and loading models.
After loading your model, you can Classify instances, see Classifying instances in Use Weka in your Java code
I wrote an example code in following git repository WekaExamples. I am also copy pasting code here but working example in that repository. you can run it in commandline.
gradlew loadArffAndTrainModelExample1 loadModelAndTestExampleInstance1
loadArff
String datasetName = "weather.nominal";
Instances data = DataSetHelper.getInstanceFromFile("data/" + datasetName + ".arff");
// weka.classifiers.trees.J48 -C 0.25 -M 2
String classifierFullName = "weka.classifiers.trees.J48"
String optionString = " -C 0.25 -M 2"
AbstractClassifier classifier = (AbstractClassifier) Class.forName(classifierFullName).newInstance();
classifier.setOptions(Utils.splitOptions(optionString));
classifier.buildClassifier(data); // build classifier
String modelFullFileName = Finals.MODELS_SAVE_FOLDER + classifier.getClass().getName() + ".model";
SerializationHelper.write(modelFullFileName, classifier);
loadModelAndTestExampleInstance1
String datasetName = "weather.nominal.only.header";
Instances data = DataSetHelper.getInstanceFromFile("data/" + datasetName + ".arff");
Instance inst = new DenseInstance(data.numAttributes());
inst.setDataset(data);
inst.setValue(0, 1);
inst.setValue(1, 2);
inst.setValue(2, 0);
inst.setValue(3, 1);
println(inst)
Classifier cls = (Classifier) SerializationHelper.read("models/weka.classifiers.trees.J48.model");
double prediction = cls.classifyInstance(inst);
println("prediction as double: " + prediction);
println("prediction as name: " + data.classAttribute().value((int) prediction));
Code output is as following
:compileJava UP-TO-DATE
:compileGroovy UP-TO-DATE
:processResources UP-TO-DATE
:classes UP-TO-DATE
:loadModelAndTestExampleInstance1
overcast,cool,high,FALSE,?
prediction as double: 0.0
prediction as name: yes
BUILD SUCCESSFUL