Search code examples
regexregular-language

Regular expression to filter urls with strings repeated 3 times or more


I am using the regular expression (\b\w+\b)\W+\1{3,} to filter urls with strings repeated three times or more. I tried (\b\w+\b)\W+\1{3,} or (\b\w+\b)\W{3,}+\1 but of no help

http://rubular.com/r/6IyCPyBiuW -> (\b\w+\b)\W+\1 -> this works to find words repeated more than one time only but I am interested to find words repeated more than three times.

http://rubular.com/r/O9NcobUsTX -> (\b\w+\b)\W+\1{3,} -> this doesn't work to find words repeated three or more


Solution

  • The following regular expression works:

    (\w+\W)\1{2,}
    

    The above matches the non-word character as well, exactly, So, alternately, you could use the rather ugly looking

    (\w+)(?:\W+\1){2,}
    

    Details:

    \w    -> single word character
    \w+   -> one or more word characters
    \W    -> non-word character
    \1    -> back-reference to capturing group #1 (in this case, (\w+)
    {2,}  -> 2 or more of (?:\W+\1)
    (?:)  -> grouping without actually capturing anything
    

    http://rubular.com/r/Trb41xxCAt