Input is
<p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
<p>
Then you have to find the end of the paragraph
</p>
Expected First Output is ( as I am using greedy quantifier)
<p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
<p>
Then you have to find the end of the paragraph
</p>
Code used for Greedy as below
text = '''
<p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
<p>
Then you have to find the end of the paragraph
</p>
'''
pattern=re.compile(r'\<p\>.*\<\/p\>')
data1=pattern.match(text,re.MULTILINE)
print('data1:- ',data1,'\n')
Expected second Output is ( as I am using Lazy quantifier)
<p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
Code used for lazy is as below
text = '''
<p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
<p>
Then you have to find the end of the paragraph
</p>
'''
#pattern=re.compile(r'\<p\>.*?\<\/p\>')
pattern=re.compile(r'<p>.*?</p>')
data1=pattern.match(text,re.MULTILINE)
print('data1:- ',data1,'\n')
I am getting None is both case as Actual Output
You have a couple of issues. Firstly, when using Pattern.match
, the second and third parameters are positional, not flags. The flags need to be specified to re.compile
. Secondly, you should be using re.DOTALL
to make .
match newline, not re.MULTILINE
. Finally - match
insists that the match occurs at the beginning of the string (which in your case is a newline character), so it won't match. You might want to use Pattern.search
instead. This will work for your sample input:
pattern=re.compile(r'<p>.*</p>', re.DOTALL)
data1=pattern.search(text)
print('data1:- ',data1.group(0),'\n')
Output:
data1:- <p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
<p>
Then you have to find the end of the paragraph
</p>
Single match:
pattern=re.compile(r'<p>.*?</p>', re.DOTALL)
data1=pattern.search(text)
print('data1:- ',data1.group(0),'\n')
Output:
data1:- <p>
The very <em>first</em> task is to find the beginning of a paragraph.
</p>
Note also that /
, <
and >
have no special meaning in regexes and don't need to be escaped. I've removed that in my code above.