I have set of financial documents (Fixed terms deposit documents, Credit card documents). I want to automatically identify and tag financial entities/instruments in those documents.
For example if the document contains this phrase “reserves the right to repay with interest without notice”. I want to identify financial term related to it, and tag with it, for this sentence it is “Callable”. For this phrase “permit premature withdrawal” the related financial term is “Putable”, so if this phrase is in the documents I want to tag it with term “Putable”.
The financial terms will come from, Financial Industry Business Ontology. Is there any possibility of using Stanford parser for this purpose? Can I use POS tagger for this purpose? I may have to train the Stanford parser with financial instruments, If it is possible how can I train the Stanford parser to identify financial instruments?
A parser or part of speech tagger out of the box will not identify domain specific concepts such as these. However, the natural language analysis they provide may be useful building blocks for a solution. Or if the phrases you need to identify are near enough to fixed phrases, they may be unnecessary and you should concentrate on finding the fixed phrases and classifying them.
While these are not "named entities", the problem is closer to named entity recognition, in that you are recognizing semantic phrase classes. You could either annotate examples of the phrases you wish to find and train a model with a named entity recognizer (e.g., Stanford NER) or write rules that match instances (using something like ANNIE in GATE or Stanford's TokensRegexPattern.