Search code examples
javaregexbacktrackingcapturing-group

Repetative capturing group matches only last occurence


Following text data given i am experiencing strange capturing group behavior. When i try to iterate over all tables only the last row of data. Is there a way to maintain all capturing groups not only the last row (values of each table)?

I am using this pattern (?<tabname>\S+)\n\=*\n(?:(\d+)\ *\|\ *(\d+)\n)+

TABLE1
=======
1  | 2
15 | 2
3  | 15

TABLE2
=======
3  | 5
12 | 2
17 | 7

Edit: Sorry for my inconsistent question, here my expected and actual outputs:

Expected output would be:

Match 1 of 2:

Group "tabname":    TABLE1
Group 2:    1
Group 3:    2
Group 4:    15
Group 5:    2
Group 6:    3
Group 7:    15

Match 2 of 2:

Group "tabname":    TABLE2
Group 2:    3
Group 3:    5
Group 4:    12
Group 5:    2
Group 6:    17
Group 7:    7

But actual output is:

Match 1 of 2:

Group "tabname":    TABLE1
Group 2:    3
Group 3:    15

Match 2 of 2:

Group "tabname":    TABLE1
Group 2:    17
Group 3:    7

Solution

  • You can collect your data in 2 passes. The first regex will just match the tables with all the values:

    "(?<tabledata>\\S+)\\s+\\S+(?<vals>[|\\d\\s]+)"
    

    See demo. Next, we'll just match the numbers and add them to the string array (with the simple \d+ regex).

    Here is a full Java demo producing [[TABLE1, 1, 2, 15, 2, 3, 15], [TABLE2, 3, 5, 12, 2, 17, 7]]:

    import java.util.*;
    import java.lang.*;
    import java.io.*;
    import java.util.regex.*;
    
    class Ideone
    {
        public static void main (String[] args) throws java.lang.Exception
        {
            String s = "TABLE1\n=======\n1  | 2\n15 | 2\n3  | 15\n\nTABLE2\n=======\n3  | 5\n12 | 2\n17 | 7"; 
            Pattern pattern = Pattern.compile("(?<tabledata>\\S+)\\s+\\S+(?<vals>[|\\d\\s]+)");
            Matcher matcher = pattern.matcher(s);
            List<List<String>> res = new ArrayList<>();
            while (matcher.find()){
                List<String> lst = new ArrayList<>();
                if (matcher.group("tabledata") != null) {
                    lst.add(matcher.group("tabledata"));
                }
                if (matcher.group("vals") != null) {
                    Matcher m = Pattern.compile("\\d+").matcher(matcher.group("vals"));
                    while (m.find()) {
                        lst.add(m.group(0));
                    }
                }
                res.add(lst);
            } 
            System.out.println(res); 
        }
    }