I have a string that I want to break into parts at every semicolon ;
.
I'm using JAVA String.split(regex)
for that, creating an array of strings.
EXAMPLE:
string 1;
string 2;
string 3;
string 4 (
substring 1;
substring 2;
substring 3;
);
string 4;
I'm using line.split("\\s*;\\s*");
right now...
But that, as expected but not wanted, gives me back ["string 1", "string 2", "string 3", "string 4 (\nsubstring 1", "substring 2", "substring 3", ")", "string 4", ""]
.
So how do I match every ;
so I can split at it except for the ones inside the parenthesis (the ones after the substrings)?
EDIT:
I did manage to create a regex to match ";" inside the parenthesis, but not outside... but after using logic and converting ~(a^b)
to ~av~b
(de morgan law) I did make a regex to match ";" outside the parenthesis.
But it still doesn't work and still breaks in every semicolon... is it something with Java itself?
Current Pattern: ((?<![\S\s]*?\([\S\s]*?)|(?![\S\s]*?\)[\S\s]*?));
I'm sure some Java pros have much better solutions than regular expressions, yet this might be somewhat close to look into:
.*\((?:\s*(?:[^\r\n]*;)\s*)+\);|[^\r\n]+
I guess you'd likely want to trim
and push it to an array.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegularExpression{
public static void main(String[] args){
final String regex = ".*\\((?:\\s*(?:[^\\r\\n]*;)\\s*)+\\);|[^\\r\\n]+";
final String string = "string 1;\n"
+ "string 2;\n"
+ "string 3;\n"
+ "string 4 (\n"
+ " substring 1;\n"
+ " substring 2;\n"
+ " substring 3;\n"
+ ");\n"
+ "string 4;\n"
+ "string 1;\n"
+ "string 2;\n"
+ "string 3;\n"
+ "string 4 (\n"
+ " substring 1;\n"
+ " substring 2;\n"
+ " substring 3;\n"
+ ");\n"
+ "string 4;";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
Full match: string 1;
Full match: string 2;
Full match: string 3;
Full match: string 4 (
substring 1;
substring 2;
substring 3;
);
Full match: string 4;
Full match: string 1;
Full match: string 2;
Full match: string 3;
Full match: string 4 (
substring 1;
substring 2;
substring 3;
);
Full match: string 4;
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.