Search code examples

Greedy and Lazy quantifier. Testing with HTML tags

Input is

The very <em>first</em> task is to find the beginning of a paragraph.
Then you have to find the end of the paragraph

Expected First Output is ( as I am using greedy quantifier)

The very <em>first</em> task is to find the beginning of a paragraph.
Then you have to find the end of the paragraph

Code used for Greedy as below

text = '''
The very <em>first</em> task is to find the beginning of a paragraph.
Then you have to find the end of the paragraph
print('data1:- ',data1,'\n')

Expected second Output is ( as I am using Lazy quantifier)

The very <em>first</em> task is to find the beginning of a paragraph.

Code used for lazy is as below

text = '''
The very <em>first</em> task is to find the beginning of a paragraph.
Then you have to find the end of the paragraph
print('data1:- ',data1,'\n')

I am getting None is both case as Actual Output


  • You have a couple of issues. Firstly, when using Pattern.match, the second and third parameters are positional, not flags. The flags need to be specified to re.compile. Secondly, you should be using re.DOTALL to make . match newline, not re.MULTILINE. Finally - match insists that the match occurs at the beginning of the string (which in your case is a newline character), so it won't match. You might want to use instead. This will work for your sample input:

    pattern=re.compile(r'<p>.*</p>', re.DOTALL)
    print('data1:- ',,'\n')


    data1:-  <p>
    The very <em>first</em> task is to find the beginning of a paragraph.
    Then you have to find the end of the paragraph

    Single match:

    pattern=re.compile(r'<p>.*?</p>', re.DOTALL)
    print('data1:- ',,'\n')


    data1:-  <p>
    The very <em>first</em> task is to find the beginning of a paragraph.

    Note also that /, < and > have no special meaning in regexes and don't need to be escaped. I've removed that in my code above.