I work with Excel-file, which contains several sentences. I would like to generate new attribute (I use "Generate Attribute" operator), which returns (“true or false”) if the sentence contains the some numbers with white spaces between them (e.g. 234 45 56). I have used the function “match nominal regex” (matches(sentences,"\d+\s+\d)
) to do this. However, I faced the problem that Rapidminer does not recognize the escape () character. How do I change my Regex to make it work?
Some additional comments/examples:
My input sentences:
word word 123 345 6665 23456 54 word word word
word word word 12.3 34.5 6665 23.456 5.4 word word word
word word word 12,3 34,5 6665 23,456 5.4 word word word
word word word 12,3% 34,5% 6665% 23,456% 5.4% word word word
My output will be new variable with true or false, if the sentence contains such chain of numbers.
I first thought to use following Regex to capture numbers \d+[.,]?\d*\s+\d+[.,]?\d*.
You may express \d
as [0-9]
and \s
as a space. Also, it seems you need to match the full line with matches
, thus, add .*
match(sentences,".*[0-9] +[0-9].*")
This matches any 0+ chars other than a newline (as more as possible) followed with a digit, 1+ spaces and a digit, and then again 0+ chars other than a newline.
Also, try doubling the \
to match \d
or \s
(since the regex is Java flavor):
matches(sentences,".*\\d+\\s+\\d.*")