Search code examples
pythonregexpython-re

Replace block of consecutive lines starting with same pattern


I'd like to match (and replace with a custom replacement function) each block of consecutive lines that all start by foo. This nearly works:

import re

s = """bar6387
bar63287
foo1234
foohelloworld
fooloremipsum
baz
bar
foo236
foo5382
bar
foo879"""

def f(m):
    print(m)

s = re.sub('(foo.*\n)+', f, s)
print(s)
# <re.Match object; span=(17, 53), match='foo1234\nfoohelloworld\nfooloremipsum\n'>
# <re.Match object; span=(61, 76), match='foo236\nfoo5382\n'>

but it fails to recognize the last block, obviously because it is the last line and there is no \n at the end.

Is there a cleaner way to match a block of one or multiple consecutive lines starting with same pattern foo?


Solution

  • You can use

    re.sub(r'(?m)^foo.*(?:\nfoo.*)*', f, s)
    re.sub(r'^foo.*(?:\nfoo.*)*', f, s, flags=re.M)
    

    where

    • ^ - matches start of string (here, a start of any line due to (?m) or re.M option)
    • foo - matches foo
    • .* - any zero or more chars other than line break chars as many as possible
    • (?:\nfoo.*)* - zero or more sequences of a newline, foo and then the rest of the line.

    See the Python demo:

    import re
    
    s = "bar6387\nbar63287\nfoo1234\nfoohelloworld\nfooloremipsum\nbaz\nbar\nfoo236\nfoo5382\nbar\nfoo879"
    def f(m):
        print(m.group().replace('\n', r'\n'))
    
    re.sub(r'(?m)^foo.*(?:\nfoo.*)*', f, s)
    

    Output:

    foo1234\nfoohelloworld\nfooloremipsum
    foo236\nfoo5382
    foo879