Following text data given i am experiencing strange capturing group behavior. When i try to iterate over all tables only the last row of data. Is there a way to maintain all capturing groups not only the last row (values of each table)?
I am using this pattern (?<tabname>\S+)\n\=*\n(?:(\d+)\ *\|\ *(\d+)\n)+
TABLE1
=======
1 | 2
15 | 2
3 | 15
TABLE2
=======
3 | 5
12 | 2
17 | 7
Edit: Sorry for my inconsistent question, here my expected and actual outputs:
Expected output would be:
Match 1 of 2:
Group "tabname": TABLE1
Group 2: 1
Group 3: 2
Group 4: 15
Group 5: 2
Group 6: 3
Group 7: 15
Match 2 of 2:
Group "tabname": TABLE2
Group 2: 3
Group 3: 5
Group 4: 12
Group 5: 2
Group 6: 17
Group 7: 7
But actual output is:
Match 1 of 2:
Group "tabname": TABLE1
Group 2: 3
Group 3: 15
Match 2 of 2:
Group "tabname": TABLE1
Group 2: 17
Group 3: 7
You can collect your data in 2 passes. The first regex will just match the tables with all the values:
"(?<tabledata>\\S+)\\s+\\S+(?<vals>[|\\d\\s]+)"
See demo. Next, we'll just match the numbers and add them to the string array (with the simple \d+
regex).
Here is a full Java demo producing [[TABLE1, 1, 2, 15, 2, 3, 15], [TABLE2, 3, 5, 12, 2, 17, 7]]
:
import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.regex.*;
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
String s = "TABLE1\n=======\n1 | 2\n15 | 2\n3 | 15\n\nTABLE2\n=======\n3 | 5\n12 | 2\n17 | 7";
Pattern pattern = Pattern.compile("(?<tabledata>\\S+)\\s+\\S+(?<vals>[|\\d\\s]+)");
Matcher matcher = pattern.matcher(s);
List<List<String>> res = new ArrayList<>();
while (matcher.find()){
List<String> lst = new ArrayList<>();
if (matcher.group("tabledata") != null) {
lst.add(matcher.group("tabledata"));
}
if (matcher.group("vals") != null) {
Matcher m = Pattern.compile("\\d+").matcher(matcher.group("vals"));
while (m.find()) {
lst.add(m.group(0));
}
}
res.add(lst);
}
System.out.println(res);
}
}