I have a group of words:
"dog", "car", "house", "work", "cat"
I need to be able to match at least 3 of them in a text, for example:
"I always let my cat and dog at the animal nursery when I go to work by car"
Here I want to match the regex because it matches at least 3 words (4 words here):
"cat", "dog", "car" and "work"
I want to use it with Oracle's regexp_like
function
I also need it to work with consecutive words
Since Oracle's regexp_like
doesn't support non-capturing groups and word boundaries, the following expression can be used:
^((.*? )?(dog|car|house|work|cat)( |$)){3}.*$
Alternatively, a larger but arguably cleaner solution is:
^(.*? )?(dog|car|house|work|cat) .*?(dog|car|house|work|cat) .*?(dog|car|house|work|cat)( .*)?$
NOTE: These will both match the same word used multiple times, e.g. "dog dog dog".
EDIT: To address the concerns over punctuation, a small modification can be made. It isn't perfect, but should match 99% of situations involving punctuation (but won't match e.g. !dog
):
^((.*? )?(dog|car|house|work|cat)([ ,.!?]|$)){3}.*$