Search code examples
regexruta

How do you mix Ruta types and regexes within a rule?


I want to match street names and their house numbers that can also contain one character and a range of house numbers.

For Example:

Birkenstraße 22b
Birkenstraße 22b-23a
Birkenstraße 22b/23z

For this, I have the following rule in a ruta script:

(Street PERIOD? ((NUM "b"? (("/"|"-") NUM "b"?)?) {-> MARK(HouseNumber)}));

"b" is the place I want to match all characters, like in a regex with [a-zA-Z]. But I have tried to replace "b" with "[a-zA-Z]" and no HouseNumber was recognized at all. Whereas with "b" I can recognize the first part of the streets Birkenstraße 22b in my examples.

How can I mix this regular expression within a rule in UIMA Ruta?


Solution

  • I declared a type and assigned it like this at the begin of my script:

    DECLARE CHARS;
    W{REGEXP("[a-zA-Z]") -> MARK(CHARS)};
    

    After that, I added the type CHARS to my rule like this and it worked:

    (Street PERIOD? ((NUM CHARS? (("/"|"-") NUM CHARS?)?) {-> MARK(HouseNumber)}));