Search code examples
javaregexstringpattern-matchingreplaceall

Regex to replace a repeating string pattern


I need to replace a repeated pattern within a word with each basic construct unit. For example I have the string "TATATATA" and I want to replace it with "TA". Also I would probably replace more than 2 repetitions to avoid replacing normal words.

I am trying to do it in Java with replaceAll method.


Solution

  • I think you want this (works for any length of the repeated string):

    String result = source.replaceAll("(.+)\\1+", "$1")
    

    Or alternatively, to prioritize shorter matches:

    String result = source.replaceAll("(.+?)\\1+", "$1")
    

    It matches first a group of letters, and then it again (using back-reference within the match pattern itself). I tried it and it seems to do the trick.


    Example

    String source = "HEY HEY duuuuuuude what'''s up? Trololololo yeye .0.0.0";
    
    System.out.println(source.replaceAll("(.+?)\\1+", "$1"));
    
    // HEY dude what's up? Trolo ye .0