I am using the regular expression (\b\w+\b)\W+\1{3,}
to filter urls with strings repeated three times or more. I tried (\b\w+\b)\W+\1{3,}
or (\b\w+\b)\W{3,}+\1
but of no help
http://rubular.com/r/6IyCPyBiuW -> (\b\w+\b)\W+\1
-> this works to find words repeated more than one time only but I am interested to find words repeated more than three times.
http://rubular.com/r/O9NcobUsTX -> (\b\w+\b)\W+\1{3,}
-> this doesn't work to find words repeated three or more
The following regular expression works:
(\w+\W)\1{2,}
The above matches the non-word character as well, exactly, So, alternately, you could use the rather ugly looking
(\w+)(?:\W+\1){2,}
Details:
\w -> single word character
\w+ -> one or more word characters
\W -> non-word character
\1 -> back-reference to capturing group #1 (in this case, (\w+)
{2,} -> 2 or more of (?:\W+\1)
(?:) -> grouping without actually capturing anything