those are my possible inputs:
"@smoke"
"@smoke,@Functional1" (OR condition)
"@smoke,@Functional1,@Functional2" (OR condition)
"@smoke","@Functional1" (AND condition),
"@smoke","~@Functional1" (SKIP condition),
"~@smoke","~@Functional1" (NOT condition)
(Please note, the string input for the regex, stops at the last "
character on each line, no space or comma follows it!
The regex I came up with so far is
"((?:[~@]{1}\w*)+),?"
This matches in capturing groups for the samples 1, 4, 5 and 6 but NOT 2 and 3.
I am not sure how to continue tweaking it further, any suggestions? I would like to capture the preceding boolean meaning of the tag (eg: ~) as well please. If you have any suggestions to pre-process the string in Java before regex that would make it simpler, I am open to that possibility as well.
Thanks.
It seems that you want to match an optional ~
followed by an @
and get iterative matches for group 1. You could make use of the \G
anchors, which matches either at the start, or at the end of the previous match.
(?:"(?=.*"$)|\G(?!^))(~?@\w+(?:,~?@\w+)*)"?[,\h]?
Explanation
(?:
Non capture group
"(?=.*"$)
Match "
and assert that the string ends with "
|
Or\G(?!^)
Assert the position at the end of the previous match, not at the start)
Close non capture group(
Capture group 1
~?@\w+(?:,~?@\w+)*
Match an optional ~
, than @
and 1+ word characters and repeat 0+ times with a comma prepended)"?
Close group 1 and match an optional "
[,\h]
Match either a comma or a horizontal whitespace char.Example code
String regex = "(?:\"(?=.*\"$)|\\G(?!^))(~?@\\w+(?:,~?@\\w+)*)\"?[,\\h]?";
String string = "\"@smoke\"\n"
+ "\"@smoke,@Functional1\"\n"
+ "\"@smoke,@Functional1,@Functional2\"\n"
+ "\"@smoke\",\"@Functional1\"\n"
+ "\"@smoke\",\"~@Functional1\"\n"
+ "\"~@smoke\",\"~@Functional1\"";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
@smoke
@smoke,@Functional1
@smoke,@Functional1,@Functional2
@smoke
@Functional1
@smoke
~@Functional1
~@smoke
~@Functional1
Edit
If there are no consecutive matches, you could also use:
"(~?@\w+(?:,~?@\w+)*)"