Search code examples
uimaruta

Uima Ruta Inconsistency Word


Im taging HyphenizationWord Like off-line,New-list,VBSE-in..etc using

(SW|CW|CAP) HYPHEN (SW|CW|CAP) HYPHEN (SW|CW|CAP) {-PARTOF(HyphenizationWord) ->MARK(ThreeHyphenizationWord,1,5)};
(SW|CW|CAP) HYPHEN (SW|CW|CAP)  {-PARTOF(HyphenizationWord),-PARTOF(ThreeHyphenizationWord) ->MARK(HyphenizationWord,1,3),MARK(PreHyphenizationWords,1),MARK(PosHyphenixationWords,3)};

And i always want to tag words like off line,New list..etc. But my script Wrongly taged some words LIke..off in,VBSE line.

DECLARE ComplexPreWord,ComplexPostWord;
//BLOCK (foreach) HyphenizationWord{}
//{
 STRING PreWord;
STRINGLIST PreWordList;
PreHyphenizationWords{-   >MATCHEDTEXT(PreWord),ADD(PreWordList,PreWord)};
W {INLIST(PreWordList)->ComplexPreWord};

STRING PostWord;
STRINGLIST PostWordList;
PosHyphenixationWords{- >MATCHEDTEXT(PostWord),ADD(PostWordList,PostWord)};
W {INLIST(PostWordList)->ComplexPostWord};
//}

ComplexPreWord ComplexPostWord{->MARK(ComplexWord,1,2)};

There is any Way To Rectify my problem..


Solution

  • I do not know if I understood your question correctly, but maybe this is what you want:

    DECLARE Hyphen;
    SPECIAL.ct == "-"{-> Hyphen};
    
    DECLARE HyphenizationWord, PreHyphenizationWords, PosHyphenixationWords;
    DECLARE HyphenizationWord ThreeHyphenizationWord;
    
    (W @Hyphen{-PARTOF(HyphenizationWord)} W Hyphen W){-> ThreeHyphenizationWord};
    (W{-> PreHyphenizationWords} @Hyphen{-PARTOF(HyphenizationWord)} W{-> PosHyphenixationWords}){-> HyphenizationWord};
    
    STRINGLIST hyphenizationWordList;
    STRING mt;
    HyphenizationWord{-> MATCHEDTEXT(mt), ADD(hyphenizationWordList, replaceAll(mt, "[- ]", ""))};
    
    DECLARE ComplexWord;
    MARKFAST(ComplexWord,hyphenizationWordList);
    

    The script starts with your rules (rewritten). Then, the covered text of the HyphenizationWord annotation is stored in a list, but the dashes and spaces are removed beforehand. Then, this list is simply used in a dictionary lookup with MARKFAST.

    DISCLAIMER: I am a developer of UIMA Ruta