Search code examples
pythonregex-lookaroundsregex-group

Grouping a repetitive section starting and end on a specific format/word


I am trying to capture a multiline string which start at a specific word Case and end with a date with format dd.mm.yyyy OR dd.m.dddd

Here is sample text:

Case No.X.1 I know I am a lucky boy (1) a sda asddasd (ii) sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818

Case No.X.1 I know I am a lucky boy (1) a sda asddasd (ii) sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818

Case No.X.1 I know I am a lucky boy (1) a sda asddasd (ii) sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818

Case No.X.1 I know I am a lucky boy (1) a sda asddasd (ii) sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818

I am trying this:

Flags: g m i

^case[^]*\d{1,2}\.\d{1,2}\.\d{2,4}
^case[\s\S]*\d{1,2}\.\d{1,2}\.\d{2,4}
((^case)[\s\S]+(\d{1,2}\.\d{1,2}\.\d{2,4}))

note: case insensitive flag is set

I am expecting to get group of each paragraph (case - date).

These expressions capture the only one group with first case to last date

Case No.X.1 I know I am a lucky boy (1) a sda asddasd (ii) sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date .........29.8.1818

Case No.X.1 I know I am a lucky boy (1) a sda asddasd (ii) sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date .........29.8.1818

Case No.X.1 I know I am a lucky boy (1) a sda asddasd (ii) sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date .........29.8.1818

Case No.X.1 I know I am a lucky boy (1) a sda asddasd (ii) sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date .........29.8.1818

I am investigating newline and lookarounds.


Solution

  • Using flags=re.DOTALL|re.M (regex101):

    data = '''Case No.X.1 I know I am a lucky boy (1) a sda asddasd (ii)
    sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818
    
    Case No.X.2 I know I am a lucky boy (1) a sda asddasd (ii)
    sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818
    
    Case No.X.3 I know I am a lucky boy (1) a sda asddasd (ii)
    sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818
    
    Case No.X.4 I know I am a lucky boy (1) a sda asddasd (ii)
    sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818'''
    
    import re
    
    for m in re.findall(r'^Case.*?\d{1,2}\.\d{1,2}\.\d{2,4}$', data, flags=re.DOTALL|re.M):
        print(m)
        print('-' * 160)
    

    Prints:

    Case No.X.1 I know I am a lucky boy (1) a sda asddasd (ii)
    sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------
    Case No.X.2 I know I am a lucky boy (1) a sda asddasd (ii)
    sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------
    Case No.X.3 I know I am a lucky boy (1) a sda asddasd (ii)
    sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------
    Case No.X.4 I know I am a lucky boy (1) a sda asddasd (ii)
    sa asdas asd aklk Railway, Airplane asd - (one two three four). Closing date ......... 29.8.1818
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------