Can anybody tell me how to create training data for categorization. I am using OpenNLP for categorization. Is there any tool to create training data or if i have to create it manually then how it should be done? I am a complete noob in this field. Please help
Well, normally you have some kind of historical data of previous (manual) categorization. Else you would have to create the data that your need somehow. Such data is often created by observation.
Although it heavy depends on the data you are trying to categorize.
If your are able to generate training data you would have a perfect algorithm for the data, and you would not need to train a system, would you?
If it is not possible to have training data, you might have to look at algorithms which don't need to learn upfront, i.e. which learn as data comes in and someone is constantly correcting the system's faults.