Search code examples
javaregexregex-group

Trying to match possible tags in string by regex


those are my possible inputs:

"@smoke"
"@smoke,@Functional1" (OR condition)
"@smoke,@Functional1,@Functional2" (OR condition)
"@smoke","@Functional1" (AND condition),  
"@smoke","~@Functional1" (SKIP condition), 
"~@smoke","~@Functional1" (NOT condition)

(Please note, the string input for the regex, stops at the last " character on each line, no space or comma follows it!

The regex I came up with so far is

"((?:[~@]{1}\w*)+),?"

This matches in capturing groups for the samples 1, 4, 5 and 6 but NOT 2 and 3.

I am not sure how to continue tweaking it further, any suggestions? I would like to capture the preceding boolean meaning of the tag (eg: ~) as well please. If you have any suggestions to pre-process the string in Java before regex that would make it simpler, I am open to that possibility as well.

Thanks.


Solution

  • It seems that you want to match an optional ~ followed by an @ and get iterative matches for group 1. You could make use of the \G anchors, which matches either at the start, or at the end of the previous match.

    (?:"(?=.*"$)|\G(?!^))(~?@\w+(?:,~?@\w+)*)"?[,\h]?
    

    Explanation

    • (?: Non capture group
      • "(?=.*"$) Match " and assert that the string ends with "
      • | Or
      • \G(?!^) Assert the position at the end of the previous match, not at the start
    • ) Close non capture group
    • ( Capture group 1
      • ~?@\w+(?:,~?@\w+)* Match an optional ~, than @ and 1+ word characters and repeat 0+ times with a comma prepended
    • )"? Close group 1 and match an optional "
    • [,\h] Match either a comma or a horizontal whitespace char.

    Regex demo | Java demo

    Example code

    String regex = "(?:\"(?=.*\"$)|\\G(?!^))(~?@\\w+(?:,~?@\\w+)*)\"?[,\\h]?";
    String string = "\"@smoke\"\n"
         + "\"@smoke,@Functional1\"\n"
         + "\"@smoke,@Functional1,@Functional2\"\n"
         + "\"@smoke\",\"@Functional1\"\n"
         + "\"@smoke\",\"~@Functional1\"\n"
         + "\"~@smoke\",\"~@Functional1\"";
    
    Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
    Matcher matcher = pattern.matcher(string);
    
    while (matcher.find()) {
        System.out.println(matcher.group(1));
    }
        
    

    Output

    @smoke
    @smoke,@Functional1
    @smoke,@Functional1,@Functional2
    @smoke
    @Functional1
    @smoke
    ~@Functional1
    ~@smoke
    ~@Functional1
    

    Edit

    If there are no consecutive matches, you could also use:

    "(~?@\w+(?:,~?@\w+)*)"
    

    Regex demo