I have read through Lingpipe for NLP and found that we have a capability there to identify mentions of names of people, locations and organizations. My questions is that if I have a training set of documents that have mentions of let's say software projects inside the text, can I use this training set to train a named entity recognizer? Once the training is complete, I should be able to feed a test set of textual documents to the trained model and I should be able to identify mentions of software projects there.
Is this generic NER possible using NER? If so, what features should I be using that I should feed?
Thanks Abhishek S
Provided that you have enough training data with tagged software projects that would be possible.
If using Lingpipe, I would use character n-grams model as the first option for your task. They are simple and usually do the work. If results are not good enough some of the standard NER features are:
Finally, for each token you could add contextual features, tokens before the current one (t-1, t-2...), tokens after the current one (t+1,t+2...) as well as their bigram combinations (t-2^t-1), (t+1^t+2).