I'm trying to process strings with repeated chars in order to find the correct word in a dictionary.
The approach I must use is to find words with 3 or more consecutive letters and remove them into 2 consecutive letters.
Then I'll look for in the dictionary if this word exists. If the word doesn't exist, then I must remove the two consecutive letters into 1 letter only.
Example:
gooooooood -> good (this existis)
awesooooome -> awesoome (this doesn't exist) -> awesome (this exists)
aaawwwesooooooommmme -> aawwesoomme (this doesn't exist) -> awesome (this exists)
I'm working with JAVA and i'm already using this regular expression to get the words with 3 or more repeated letters in a string:
Pattern p = Pattern.compile("\\b\\w*(\\w)\\1{2}\\w*");
You can use this regex ("pure version"):
(\b\w*?)(\w)\2{2,}(\w*)
String version:
"(\\b\\w*?)(\\w)\\2{2,}(\\w*)"
You should use replaceAll(regex, "$1$2$2$3")
(\b\w*?) // capture group 1 is lazy
(\w) // capture group 2 captures the first occurrence of the char
\2{2,} // char may occur 2 or more times...
(\w*) // capture group 3
Note that the $number
in the replacement refers to the contents of the corresponding capture group.