I have the following string:
hello
abcd
pqrs
123
123
123
My objective is to capture everything starting hello and till the first occurrence of 123. So the expected output is as:
hello
abcd
pqrs
123
I used the following:
output=re.findall('hello.*123?',input_string,re.DOTALL)
But the output is as:
['hello\nabcd\npqrs\n123\n123\n123']
Is there a way to make this lookup non-greedy using ?
for 123? Or is there any other way to achieve the expected output?
Try using lookhead
for this. You are looking for a group of characters followed by \n123\n
:
import re
input_string = """hello
abcd
pqrs
123
123
123"""
output_string = re.search('[\w\n]+(?=\n123\n)', input_string).group(0)
print(output_string)
#hello
#abcd
#pqrs
#123
I hope this proves useful.