Search code examples
regexregex-group

RegEx splitting multiple lines - grouped by ID


I would like to create a regular expression to split a CSV file into multiple files based on an ID which can be found in each row. Here is an example:

ID,text,value
1,some text,5
1,other text,3
1,something,2
2,sample,4
3,john doe,2
3,jane doe,3
4,foo,1
4,bar,2
4,baz,3

The expected capturing groups would be

1,some text,5
1,other text,3
1,something,2

2,sample,4

3,john doe,2
3,jane doe,3

4,foo,1
4,bar,2
4,baz,3

What makes it even more complicated is that the ID might appear 1 or multiple times (like 2 above). Here is what I tried:

^.*?\n(((\d+),.*?\n)(\2.*?){0,})

But it is not working as expected.


Solution

  • You can use this regex in MULTILINE mode to group individual blocks with same ID:

    ^(\d+).*[\r\n]+(?:\1.*[\r\n]+)*
    

    RegEx Demo

    RegEx Details:

    • ^: Start
    • (\d+): Match 1+ digits and capture in group #1
    • .*[\r\n]+: Match everything until end of line
    • (?:\1.*[\r\n]+)*: Match same digit and rest of the line. Repeat this group 0 or more times