Search code examples
uimaruta

Annotation not matching if sentence has anything after the pattern


I am trying to extract the below bolded number(AN A348645 PL) through RUTA script. Please look into example I provided:

Below is my code:

Document{->RETAINTYPE(SPACE)};

((W|NUM) (NUM|W|SPACE|SPECIAL)*){REGEXP("([1]{0,1}[A-Z0-9]{2}[\\s ||-]{0,2}[A-Z0-9]{7}[\\s ||-]{0,2}[A-Z]{3})")->MARK(EntityType)};

1)

Input: Claims Experience Report - AN A348645 PLB Nest Holdings Pty Ltd
Expected output: AN A348645 PLB
Original output: No Entity is matched

But, it is working when there is no word/ letter after the pattern:

2)

Input: Claims Experience Report - AN A348645 PLB
Expected Output: AN A348645 PLB
Original output: AN A348645 PLB


Solution

  • In this example

    AN A348645 PLB Nest Holdings Pty Ltd

    the Star Greedy Quantifier *, looks for the next annotations after PLB and tries to match them using the given regexp pattern. Therefore, the rule fires only when there are no next tokens to try to match on.

    Try to apply the regular expression pattern in Ruta just as it is:

    "([1]{0,1}[A-Z0-9]{2}[\\s ||-]{0,2}[A-Z0-9]{7}[\\s ||-]{0,2}[A-Z]{3})"->EntityType;