Search code examples
javadatenlpgatetemporal

How to retrieve all kinds of dates and temporal values from text


I wanted to retrieve dates and other temporal entities from a set of Strings. Can this be done without parsing the string for dates in JAVA as most parsers deal with a limited scope of input patterns. But input is a manual entry which here and hence ambiguous.

Inputs can be like:

12th Sep |mid-March |12.September.2013

Sep 12th |12th September| 2013

Sept 13 |12th, September |12th,Feb,2013

I've gone through many answers on finding date in Java but most of them don't deal with such a huge scope of input patterns.

I've tried using SimpleDateFormat class and using some parse() functions to check if parse function breaks which mean its not a date. I've tried using regex but I'm not sure if it falls fit in this scenario. I've also used ClearNLP to annotate the dates but it doesn't give a reliable annotation set.

The closest approach to getting these values could be using a Chain of responsibility as mentioned below. Is there a library that has a set of patterns for date. I can use that maybe?


Solution

  • Yes! I've finally extracted all sorts of dates/temporal values that can be as generic as :

    mid-March | Last Month | 9/11

    To as specific as:

    11/11/11 11:11:11

    This finally happened because of awesome libraries from GATE and JAPE

    I've created a more lenient annotation rule in JAPE say 'DateEnhanced' to include certain kinds of dates like "9/11 or 11TH, February- 2001" and used a Chaining of Java regex on R.H.S. of the 'DateEnhanced' annotations JAPE RULE, to filter some unwanted outputs.