Search code examples
regexregexp-like

How to detect the number of distinct words in a string with Regex?


I would like to detect if a string contains multiple different words and would like to limit the number of words. Words all kinds of characters, except spaces.

E.g.: I want to check if the following strings have no more than three distinct words:

lorum                               -> True
lorum ipsum                         -> True
lorum ipsum dolor                   -> True
lorem lorem ipsum dolor ipsum ipsum -> True
lorem lorem <=>                     -> True
1 2 3                               -> True

lorem ipsum dolor sit lorum         -> False
lorem ipsum dolor sit               -> False
1 2 3 4                             -> False

Solution

  • To my great surprise this is actually achievable with regular expression. This is really ugly and inefficient, but it works.

    You should probably not use it though: this is not the right tool for this job.

    /^(\S*)(?: \1)*(?:(?: (\S*))(?: \1| \2)*(?: (\S*))?)?(?: \1| \2| \3)*$/gm
    

    https://regex101.com/r/0cgoFF/1