Search code examples
uimaruta

FirstToken is not found for some reference-UIMA RUTA


FirstToken is not found for some reference(which contains space at the end).

Script:

DECLARE FirstToken, LastToken;

BLOCK(InRef) Reference{}{
    ANY{POSITION(Reference,1) -> MARK(FirstToken)};
    Document{-> MARKLAST(LastToken)};
}

Input Files:

1.  Ferreira, F.R., Prado, S.D., Carvalho, M.C, and Kraemer, F.B. (2015). Biopower and biopolitics in the field of food and nutrition. Revista de Nutrição, 28(1), 109-119. Available at http://dx.doi.org/10.1590/1415-52732015000100010. 
2.  Ali, S. (2007). Feminism and postcolonialism: Knowledge/politics. Ethnic and Racial Studies, 30(2), 191–212.  
3.  Forbes, D.A., King, K.M., Kushner, K.E., Letourneau, N.L., Myrick, A.F., and Profetto-McGrath, J. (1999). Warrantable evidence in nursing science. Journal of Advanced Nursing, 29(2), 373–379.

Solution

  • Annotations that start or end with something invisible are also not visible. This definition may sound unintuitive but is required for sequential matching.

    This happens most often if some annotation starts of ends with a space. It is recommended to remove/trim these spaces from the annotations, e.g., with:

    RETAINTYPE(WS); // or RETAINTYPE(SPACE, BREAK,...);
    Reference{-> TRIM(WS)};
    RETAINTYPE;
    

    You can also work on annotations that end with a space if you make spaces visible:

    RETAINTYPE(SPACE);
    

    Beside that, you can also use the MARKFIRST action like the MARKLAST action instead of the POSITION condition, which is extremely slow.

    DISCLAIMER: I am a developer of UIMA Ruta