Is there any way to find the abbreviation short before its expansion Using Uima Ruta.
Sample Input Document
Data science” is widely recognized as an increasingly powerful force in the realm of web management and development, as well as in society in general. ML is an application of artificial intelligence. On the He found an automated teller machine (ATM). Allowing these companies to realize continuous innovation and improvement in user experience through rapid any time money (ATM) app. These ATM latter two companies are working to regain competitive advantages in the evolving web using data science techniques including natural language processing (NLP) and machine learning (ML)
Problem
I want to get the values, ML alone not ATM Because it's used as short form after expansion only. Is there a way to do so?
Here an example how to project annotations using a simplified definition detection. Does that help?
PACKAGE uima.example;
DECLARE AbbreviationDefinition;
DECLARE AbbreviationLongform;
DECLARE Abbreviation;
STRINGLIST definedAccronyms;
INT expectedWordcount;
(W[expectedWordcount, expectedWordcount]{-> AbbreviationLongform}
SPECIAL.ct=="("
c:@CAP{-> Abbreviation}<-{c{-> expectedWordcount = (c.end-c.begin)};}
SPECIAL.ct==")"
){-> AbbreviationDefinition};
// TODO check first characters of Abbreviation and AbbreviationLongform and remove annotations again if required
a:Abbreviation{PARTOF(AbbreviationDefinition) -> ADD(definedAccronyms, a.ct)};
MARKFAST(Abbreviation, definedAccronyms);
Abbreviation->{a:@Abbreviation{-> UNMARK(a)} ANY; ANY a:@Abbreviation{-> UNMARK(a)};};
a:Abbreviation{CONTAINS(Abbreviation,2,2) -> UNMARK(a)};
DISCLAIMER: I am a developer of UIMA Ruta