Search code examples
stanford-nlpnamed-entity-recognitionsutime

CoreNLP - NER and SUTime to only recognize absolute dates


I'm working with the Named Entity Recognition annotator of CoreNLP.

My problem is that I would like to not recognize as entities relative dates. My goal is to connect dates with events

Some interesting dates are 18 Feb 1997, the 20th of july, the year 1992, 4 days from today and Monday the 13th.

In this example I would like to highlight "18 Feb 1997", "20th of july" and "1992". Even if some of these dates are not complete, they can still be used to search for events.

On the other hand "4 days from today" and "Monday the 13th" are not interesting for me: the reasons are that the first it is relative to the current date (or the date the text has been written), while the second one is too generic.

Is there a simple way to tell the NER annotator to discard relative dates?

Thank you


Solution

  • I found the following solution, which works very well in my case.

    Each token representing a Time/Date Named Entity has an annotation field containing its normalized form.

    The absolute dates that I want to recognize will have a normalized form which follows the following pattern:

    • 18 Feb 1997 -> 1997/02/18
    • 20th of July -> XXXX/07/20
    • 1992 -> 1992

    Using a REGEX it is possible to discard annotations which do not have a normalized form like this.

    (\d{4}|X{4})((\/\d{2}(\/\d{2})?)?)