Search code examples
javarun-length-encoding

Pattern run-length encoding


I'm trying to find the cleanest way to perform a run-length encoding based on patterns. The goal is to compress a string by factorizing a substring composed of several same patterns.

Original String:

start{3}{3}{3}{3}end

As you can see, there are 4 "{3}" patterns. It's possible to compress this String by expressing the run of 4 "{3}" patterns as $4{3}.

Compressed String I would like to obtain:

start$4{3}end

I tried the String.replaceAll(regex, replacement) method. I know that myString.replaceAll("\\{([^<])\\}", "$1") can replace a whole pattern by its value only but I can't find how to detect and count a same-pattern run length using regular expressions.

Is using regular expression a good idea or are there any other 'better' way to do this?


Solution

  • I just get the output as follows. There should be more efficient approach than this. But hopefully this will help you

        String s = "start{3}{3}{3}{3}end";
        String pString = "\\{3\\}";
        Pattern p = Pattern.compile(pString);
        Matcher m = p.matcher(s);
    
        int count = 0;
        while (m.find()) {
            count++;
        }
        System.out.println(s.replaceAll(pString, "-").replaceFirst("-{"+count+"}", "\\$"+count+pString));