I am running a small test application using the Stanford Parser.
The parser correctly recognizes cardinals such as "1990", "one", "two", "three". I am looking for a way to retrieve the integer values for the annotated texts. Obviously this is especially of interest for the text that initially to not consist of digits like "one", "two" etc.
Is there a built in feature for this?
The parser doesn't include anything like that but CoreNLP actually has such a functionality.
You can apply the following function to the CoreMap
object of each sentence which adds the NumerizedTokensAnnotation
to the sentence and the NumericValueAnnotation
to each token.
NumberNormalizer.findAndAnnotateNumericExpressions(sentence);
Unfortunately there doesn't exist any documentation of this feature but you can take a look at the source of NumberNormalizer
which contains at least some comments and explanations.