Using a dataset, Weka and the J48 classifier I've got the following tree:
And it splits off a lot on 'NumTweets' on the right side. Can I prevent J48 from doing more than a specified amount of splits on one field? Because this is obviously overfitting my data on a specific field. Ideally I'd want it to only reuse the same field in a branch 3-4 times. Is there any way I can do this?
Thanks in advance!
To answer your first question: No, the WEKA explorer does not offer split limits on a specific attribute. This can only be done manually in code.
With that said, there are several things you can try here to limit the tree size/reduce overfitting.
You could try REPTree instead of J48. It uses the same splitting criteria as J48 but uses reduced error pruning. It has an option to limit the depth of the tree.
Decreasing the J48 pruning confidence (-C parameter) will result in more pruning and thus smaller tree size.
You can try to play around with the minNumObj (minimal number of instances reaching each leaf) parameter.