I would like to check if a string contains any word other than some predefined ones. The predefined words are What is,plus,minus,multiplied by,divided by
, single whitespace included in some of the phrases. I've read this post and this one, both using negative lookaheads, but couldn't come up with a pattern that worked.
For example, input text "What is plus abc divided by" should come back as "abc" not recognized.
What would be a correct regex for this?
Edit:
Note that I don't care about what the invalid token is, just that it exists. It can be anything, a word or a number. The question can also be thought as "check if the input contains only allowed words".
Simply join them up in a group:
(?:What is|plus|minus|multiplied by|divided by)
Note that if you have, for example, multiply
and multiply by
(i.e. one token that starts with another), multiply by
must comes first:
(?:What is|plus|minus|multiply by|multiply)
To check if the string only contains valid tokens, use:
^ # Match at the start of string
\g<token> # a pre-defined token
(?:\s+\g<token>)* # followed by 0 or more tokens
$ # right before the end of string.
...where \g<token>
denotes the expression above.
Try it on regex101.com.
Since we also need to find the (first) invalid token, you need to match every non-whitespace streaks and store those which are not matched by the expression above in a group:
(?:What is|plus|minus|multiplied by|divided by)|(\S+)
If the match contains group 1, that means it is a non-recognized token. Output an error accordingly.
Try it on regex101.com.