Search code examples
uimaruta

UIMA RUTA: Italics


Does anyone know how can I search for all words in a text that are italicized? And to extend that, search for specific words that are (or are not) italicized?

For example, given "I am certain that I am not mistaken", I'd like to extract certain, or extract all am's that are not italicized?


Solution

  • Assuming that the formatting information is present in the CAS, e.g., by applying the HtmlAnnotator (in combination with HtmlConverter) provided by Ruta, the rules could look like (as indicated in a comment of the question):

    I{-> MyType};
    SW.ct=="am"{-PARTOF(I) -> MyType};
    

    You maybe need to import the HtmlTypeSystem of Ruta.

    DISCLAIMER: I am a developer of UIMA Ruta