I'm following the official instructions for adding custom SUTime rules for fiscal year quarters (stuff like Q1, Q2, Q3 and Q4).
I used the default defs.sutime.txt
and english.sutime.txt
as templates for my own rule files.
After appending the following code to my defs.sutime.txt
// Financial Quarters
FYQ1 = {
type: QUARTER_OF_YEAR,
label: "FYQ1",
value: TimeWithRange(TimeRange(IsoDate(ANY,10,1), IsoDate(ANY,12,31), QUARTER))
}
FYQ2 = {
type: QUARTER_OF_YEAR,
label: "FYQ2",
value: TimeWithRange(TimeRange(IsoDate(ANY,1,1), IsoDate(ANY,3,31), QUARTER))
}
FYQ3 = {
type: QUARTER_OF_YEAR,
label: "FYQ3",
value: TimeWithRange(TimeRange(IsoDate(ANY,4,1), IsoDate(ANY,6,30), QUARTER))
}
FYQ4 = {
type: QUARTER_OF_YEAR,
label: "FYQ4",
value: TimeWithRange(TimeRange(IsoDate(ANY,7,1), IsoDate(ANY,9,30), QUARTER))
}
and appending the following code to my english.sutime.txt
# Financial Quarters
FISCAL_YEAR_QUARTER_MAP = {
"Q1": FYQ1,
"Q2": FYQ2,
"Q3": FYQ3,
"Q4": FYQ4
}
FISCAL_YEAR_QUARTER_YEAR_OFFSETS_MAP = {
"Q1": 1,
"Q2": 0,
"Q3": 0,
"Q4": 0
}
$FiscalYearQuarterTerm = CreateRegex(Keys(FISCAL_YEAR_QUARTER_MAP))
{
matchWithResults: TRUE,
pattern: ((/$FiscalYearQuarterTerm/) (FY)? (/(FY)?([0-9]{4})/)),
result: TemporalCompose(INTERSECT, IsoDate(Subtract({type: "NUMBER", value: $$3.matchResults[0].word.group(2)}, FISCAL_YEAR_QUARTER_YEAR_OFFSETS_MAP[$1[0].word]), ANY, ANY), FISCAL_YEAR_QUARTER_MAP[$1[0].word])
}
{
pattern: ((/$FiscalYearQuarterTerm/)),
result: FISCAL_YEAR_QUARTER_MAP[$1[0].word]
}
I'm still unable to correctly parse stuff like "Q1 2020".
How can I properly add rules for parsing fiscal year quarters (e.g. "Q1")?
Here's my full code:
import java.util.List;
import java.util.Properties;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.time.*;
import edu.stanford.nlp.util.CoreMap;
public class SUTimeSoExample {
public static void main(String[] args) {
Properties props = new Properties();
props.setProperty("sutime.includeRange", "true");
props.setProperty("sutime.markTimeRanges", "true");
props.setProperty("sutime.rules", "./defs.sutime.txt,./english.sutime.txt");
AnnotationPipeline pipeline = new AnnotationPipeline();
pipeline.addAnnotator(new TokenizerAnnotator(false));
pipeline.addAnnotator(new WordsToSentencesAnnotator(false));
pipeline.addAnnotator(new POSTaggerAnnotator(false));
pipeline.addAnnotator(new TimeAnnotator("sutime", props));
String input = "Stuff for Q1 2020";
Annotation annotation = new Annotation(input);
annotation.set(CoreAnnotations.DocDateAnnotation.class, "2020-06-01");
pipeline.annotate(annotation);
System.out.println(annotation.get(CoreAnnotations.TextAnnotation.class));
List<CoreMap> timexAnnsAll = annotation.get(TimeAnnotations.TimexAnnotations.class);
for (CoreMap cm : timexAnnsAll) {
System.out.println(cm // match
+ " --> " + cm.get(TimeExpression.Annotation.class).getTemporal() // parsed value
);
}
}
}
Note that I deleted the deafult defs.sutime.txt
and english.sutime.txt
files from the stanford corenlp models JAR in order to avoid this issue.
There is a Java code example here:
https://stanfordnlp.github.io/CoreNLP/sutime.html
It should work if you follow that example, mainly building your pipeline in this manner:
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner");
props.setProperty("ner.docDate.usePresent", "true");
// this will shut off the statistical models if you only want to run SUTime only
props.setProperty("ner.rulesOnly", "true");
// add your sutime properties as in your example
...
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
and make sure to use version 4.0.0.
You can set ner.rulesOnly
to true if you just want to run SUTime without running the statistical models.
You can use one of several properties for ner.docDate
or just set the document date in the annotation before running.