I wrote the following expression to split a string after every x word (3 for instance) followed by a space. My problem is that I need to keep the entire content. But I cannot find a way to use look behind etc to accomplish this in Java.
Anyone has experience with that?
String text = "Hello my name is Tom and i love playing football";
String regex = "([a-zA-Z0-9öÖäÄüÜß]+\\s){" + ngramm_length + "}";
System.out.println(regex);
String[] ngramms = text.split(regex);
result are 4 tokens but only the last one still contains the content, I would like to get:
1: Hello my name 2: is Tom and 3: i love playing 4: football
Look into the match information box in the link JAVA Code:
public static void main(String[] args) throws IOException {
int length = 3; //2
String dynamic_length = "";
for (int i = 1; i < length; i++) {
dynamic_length += i;
if (i + 1 < length) {
dynamic_length += ",";
}
}
final String regex = "([a-zA-Z0-9öÖäÄüÜß]+\\s){" + length + "}|([a-zA-Z0-9öÖäÄüÜß]+\\s){" + dynamic_length + "}";
final String string = "Hello my name is Tom and i love playing football\n\n";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
int count = 0;
while (matcher.find()) {
++count;
System.out.println("match:" + count + " " + matcher.group(0));
}
it is not dynamic because it is only working for length of 2 and 3. That's my problem with it or do I miss something?
for x > 1 i can use:
final String regex = "([a-zA-Z0-9öÖäÄüÜß]+\\s){" + length + "}|([a-zA-Z0-9öÖäÄüÜß]+\\s){1," + (length - 1) + "}";
for x = 1 i can use:
final String regex = "([a-zA-Z0-9öÖäÄüÜß]+\\s){" + length + "}|([a-zA-Z0-9öÖäÄüÜß]+\\s){1}";
or just splitting by space.
Thanks to Maverick_Mrt !!!
You can try this:
([a-zA-Z0-9öÖäÄüÜß]+\s){3}|([a-zA-Z0-9öÖäÄüÜß]+\s){1,2}
Look into the match information box in the link JAVA Code:
public static void main(String[] args) {
final String regex = "([a-zA-Z0-9öÖäÄüÜß]+\\s){3}|([a-zA-Z0-9öÖäÄüÜß]+\\s){1,2}";
final String string = "Hello my name is Tom and i love playing football\n\n";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
int count = 0;
while (matcher.find()) {
++count;
System.out.println("match:" + count + " " + matcher.group(0));
}
As per your comment:
if you want n block per match then you do it, make sure n>0
([a-zA-Z0-9öÖäÄüÜß]+\s){n}|([a-zA-Z0-9öÖäÄüÜß]+\s){1,n-1}
Sample output
match:1 Hello my name
match:2 is Tom and
match:3 i love playing
match:4 football