The actual patterns are not in English, so I created this simplified example to reproduce the problem: there are 3 levels of annotations (required for real application) and the 3rd level pattern does not work as expected. The phrase to be recognized is: a b c
What I expect:
# 1.
{ pattern: (/a/), action: (Annotate($0, name, "A")) }
{ pattern: (/b/), action: (Annotate($0, name, "B")) }
# 2.
{ pattern: (([name:A]) ([name:B])), action: (Annotate($0, name, "AB")) }
# 3.
{ pattern: (([name:AB]+) /c/), action: (Annotate($0, name, "C")) }
#1 and #2 works and "a b" are annotated: matched token: NamedEntitiesToken{word='a' name='AB' beginPosition=0 endPosition=1} matched token: NamedEntitiesToken{word='b' name='AB' beginPosition=2 endPosition=3} But the #3 pattern doesn't work even though one can see that we have 2 "AB" annotated tokens and it is exactly what is expected by #3 pattern. Even more if I change #1 to be
{ pattern: (/a/), action: (Annotate($0, name, "AB")) }
{ pattern: (/b/), action: (Annotate($0, name, "AB")) }
pattern #3 works correctly: matched token: NamedEntitiesToken{word='a' name='C' beginPosition=0 endPosition=1} matched token: NamedEntitiesToken{word='b' name='C' beginPosition=2 endPosition=3} matched token: NamedEntitiesToken{word='c' name='C' beginPosition=4 endPosition=5}
I can't find any difference between matched tokens when I use
# In this case #3 pattern works
{ pattern: (/a/), action: (Annotate($0, name, "AB")) }
{ pattern: (/b/), action: (Annotate($0, name, "AB")) }
or when I use
# In this case #3 pattern doesn't work
# 1.
{ pattern: (/a/), action: (Annotate($0, name, "A")) }
{ pattern: (/b/), action: (Annotate($0, name, "B")) }
# 2.
{ pattern: (([name:A]) ([name:B])), action: (Annotate($0, name, "AB")) }
In both cases I get the same annotation, but first scenario works and the second doesn't. What am I doing wrong?
This works for me:
# these Java classes will be used by the rules
ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" }
ENV.defaults["stage"] = 1
{ ruleType: "tokens", pattern: (/a/), action: Annotate($0, ner, "A") }
{ ruleType: "tokens", pattern: (/b/), action: Annotate($0, ner, "B") }
ENV.defaults["stage"] = 2
{ ruleType: "tokens", pattern: ([{ner: "A"}] [{ner: "B"}]), action: Annotate($0, ner, "AB") }
ENV.defaults["stage"] = 3
{ ruleType: "tokens", pattern: ([{ner: "AB"}]+ /c/), action: Annotate($0, ner, "ABC") }
There is a write up about TokensRegex here: