Search code examples
javaregexsequencefsm

Regular expression to find return of character for 5 times


I'm trying to build a finite state machine and I want to check the sequence that I get, with a regular expression. I need to check if the sequence is from the the following form:

For example:

"A,B,C,C,C,C,C,A" -> is accepted.

"A,B,C,C,C,C,A" -> is ignored.

"A,B,C,C,C,C,C,C,A" -> is ignored.

I found this post and that post, but everything I tried simply doesn't work.

I tried the next things: A\B\D{5}\A, ABD{5}A and a couple more, but again with no success.

EDIT: I want to know if the C character is return exactly 5 times, before and after doesn't matter at all, meaning it could be like this also:

A,A,A,F,F,R,E,D,C,C,C,C,C, ......

Don't consider the commas.

The problem is that I need to find if a sequence is accepted but, the sequence is from the next form: A,B, C*10, I created the machine class, the state class and the event class. But now I need to know if I have exactly 5 returns of C, and it causing me a lot of problems.

EDIT: It's not working, see the code Iv'e added.

String sequence1 = "A,B,C,C,C,C,A";
String sequence2 = "A,B,C,C,C,C,C,A";
String sequence3 = "A,B,C,C,C,C,C,C,A";
Pattern mPattern = Pattern.compile("(\\w)(?:,\\1){4}");
Matcher m = mPattern.matcher(sequance1);
m.matches(); //FALSE
Matcher m = mPattern.matcher(sequance2);
m.matches(); //FALSE
Matcher m = mPattern.matcher(sequance3);
m.matches(); //FALSE

It's returning always false.

How can I achieve this?

Thanks.


Solution

  • Your regex is not working because you are not considering the comma in your string, which I assume is available.

    You can try the following regex (I'm posting here a generalized pattern, you can modify it accordingly): -

    "(\\w)(?:,\\1){4}"
    

    This will match any 5 sequence of same characters separated by comma.

    \1 is used to backreference the 1st matched character, and the rest of the 4 characters should be the same as that.

    Explanation: -

    "(         // 1st capture group
       \\w     // Start with a character
     )
     (?:       // Non-capturing group
        ,      // Match `,` after `C`
        \\1    // Backreference to 1st capture group. 
               // Match the same character as in (\\w)
     ){4}"     // Group close. Match 4 times 
               // As 1st one we have already matched in (\\w)
    

    UPDATE: -

    If you just want to match 5 length sequence, you can add a negation of the matched character after the 5th match: -

    "(\\w)(?:,\\1){4}(?!,\\1)"
    

    (?!,\\1) -> Is negative look-ahead assertion. It will match 5 consecutive character that are not followed by the same character.

    UPDATE: -

    In the above Regex, we also need to do a negative look-behind for \\1 which we can't do. So, I came up with this wierd looking Regex. Which I myself don't like, but you can try it whether it works or not: -

    Not Tested: -

    "(\\w),(^\\1)(?:,\\2){4}(?!,\\2)"
    

    Explanation: -

    (       // First Capture Group
      \\w   // Any character, before your required sequence. (e.g. `A` in `A,C,C,C,C,C`)
    )       // Group end
    ,       // comma after `A`
    
    (          // Captured group 2
       ^\\1    // Character other than the one in the first captured group. 
               // Since, We now want sequence of `C` after `A`
    )
    (?:        // non-capturing group
       ,       // Match comma
       \\2     // match the 2nd capture group character. Which is different from `A`, 
               // and same as the one in group 2, may be `C`
    
    ){4}       // Match 4 times
    
    (?!        // Negative look-ahead
        ,
        \\2    // for the 2nd captured group, `C`
    )
    

    I don't know whether that explanation makes the most sense or not. But you can try it. If it works, and you can't understand, then I'll try to explain a little better.