Search code examples
javanlpstanford-nlp

Stanford Parser: Get Integer value for CARD?


I am running a small test application using the Stanford Parser.

The parser correctly recognizes cardinals such as "1990", "one", "two", "three". I am looking for a way to retrieve the integer values for the annotated texts. Obviously this is especially of interest for the text that initially to not consist of digits like "one", "two" etc.

Is there a built in feature for this?


Solution

  • The parser doesn't include anything like that but CoreNLP actually has such a functionality.

    You can apply the following function to the CoreMap object of each sentence which adds the NumerizedTokensAnnotation to the sentence and the NumericValueAnnotation to each token.

    NumberNormalizer.findAndAnnotateNumericExpressions(sentence);
    

    Unfortunately there doesn't exist any documentation of this feature but you can take a look at the source of NumberNormalizer which contains at least some comments and explanations.