Search code examples
pythonregexregex-lookaroundsmultiline

How do I match use regex to match multi-line text with specific starting and ending patterns


With the help of Python regex, I am trying to extract all the lines after [..] and starting with ;; character. See example below

sample_str = '''[TITLE]

[OPTIONS]
;;Options            Value
;;------------------ ------------
FLOW_UNITS           CFS
<MORE TEXT>

[PATTERNS]
;;Name           Type       Multipliers
;;-------------- ---------- -----------
;Daily pattern generated from time series '2-166:2-165 (obs)'.  Average value was 0.0485 MGD.
2-166:2-165_(obs)_Daily DAILY      1.011 1.008 1.06  0.908 1.072 0.998 0.942
<MORE TEXT>

[COORDINATES]
;;Node           X-Coord          Y-Coord         
;;-------------- ---------------- ----------------
<MORE TEXT>

[JUNCTIONS]
;;               Invert     Max.       Init.      Surcharge  Ponded    
;;Name           Elev.      Depth      Depth      Depth      Area      
;;-------------- ---------- ---------- ---------- ---------- ----------
1-1              837.85     15.25      0          0          0         
<MORE TEXT>  

[REPORT]
INPUT      YES
CONTROLS   NO
<MORE TEXT>
'''

I would like to get a list like

expected_result = [';;Options            Value\n;;------------------ ------------', ';;Name           Type       Multipliers\n;;-------------- ---------- -----------', ..]

I was only able to get the first lines by re.findall(r"(?<=\]\n);;.*", sample_str). Trying to add more lines pattern by adding \n like re.findall(r"(?<=\]\n);;.*\n;;.*", sample_str, re.MULTILINE) does not work since the pattern for texts I want is not uniform. I tried the using re.multiline to search for all the text until -\n but I could not get it to work as re.findall(r"(?<=\]\n);;.*-$", sample_str, re.MULTILINE).

Could someone help me with it!


Solution

  • You can use something like this:

    re.findall(r"^\[.*\]\n+((?:;;.*\n+)+)", sample_str, re.M)
    

    Here is the explanation of the expression


    EDIT: Added constraint for the pattern to start in the beginning of the line. Thanks for noticing @Wiktor Stribiżew