I have a dataset which is containing a so many instances with class 0 and some very few instances with class 1 and this is a problem since instances with class 0 will dominate instances with class 1 and the precision is very very low for class 1. I am using weka java API and in the API I found an option of giving weight to instances so I decided to give weigh to the instances with class 1 in my test set as follow:
breader = new BufferedReader(new FileReader("weka/train.txt"));
Instances train = new Instances(breader);
train.setClassIndex(train.numAttributes() - 1);
Instances testset= new Instances(new BufferedReader(new FileReader("weka/test.txt")));
unlabeled.setClassIndex(testset.numAttributes() - 1);
for (int i = 0; i < testset.numInstances(); i++) {
if(testset.instance(i).classValue()==1){
testset.instance(i).setWeight(30);
}
}
After doing so the precision increased a lot. Now I am wondering if what I am doing is acceptable and if yes how can reason about that?
You must consider that, the weight that you add to your instances affects your prediction model. The prediction model for that cases will also be heavy. You can fall into overfitting due to the strange training. The possibility of overfitting exists because the criterion used for training the model maybe is not the same as the criterion used to judge the efficacy of the model. However if you can not get more training data, it is a risk you can take. After all it works for you.