I had a doubt in this rule Line{-REGEXP("CORA:.*") -> MARK(Reference)}; What CORA:.*means
The string CORA:.*
is interpreted as a regular expression. It is directly delegated to the Pattern implementation of Java. It means that the text the regular expression is applied on needs to start with the string CORA:
followed by arbitrary characters (.*
). The conditions checking the regular expression is negated, thus the rule creates annotations of the type Reference for each visible Line annoation that starts not with the string CORA:
.
I assume in the context of the rule, the document starts with a line CORA:
which indicates the layout/structure/source of the references.
DISCLAIMER: I am a developer of UIMA Ruta