Search code examples
python-2.7multiline

python multiline regex capture


I have the following string:

hello
abcd
pqrs
123
123
123

My objective is to capture everything starting hello and till the first occurrence of 123. So the expected output is as:

hello
abcd
pqrs
123

I used the following:

output=re.findall('hello.*123?',input_string,re.DOTALL)

But the output is as:

['hello\nabcd\npqrs\n123\n123\n123']

Is there a way to make this lookup non-greedy using ? for 123? Or is there any other way to achieve the expected output?


Solution

  • Try using lookhead for this. You are looking for a group of characters followed by \n123\n:

    import re
    
    input_string = """hello
    abcd
    pqrs
    123
    123
    123"""
    
    output_string = re.search('[\w\n]+(?=\n123\n)', input_string).group(0)
    
    print(output_string)
    
    #hello
    #abcd
    #pqrs
    #123
    

    I hope this proves useful.