Python: use .search method to extract everything between 2 words which occur more than once

I have a VHDL file which contains some paragraph I want to extract. Generally, it looks like this:

Declaration 1.
Some codes.
(Following are paragraphs I want to extract)
case (state) is
    case body 1
end case;

Declaration 2.
Some codes.
(Following are paragraphs I want to extract)
case (state) is
    case body 2
end case;

So the "case body 1" and "case body 2" are what I want. "case (state) is" and "end case;" can be matched along or not, it does not matter. I have tried some methods like:

f1=open('/home/liuduo/Desktop/f2.vhd')
data=f1.read()
pattern=re.compile('case (state) is[\s\S]*?end case;')
reg=pattern.search(data).group()

pattern=re.compile('(?<=\bcase\b).*?(?=\bend\b)')
reg=pattern.search(data).group()

pattern=re.compile('.*?case(.*?)end.*?')
reg=pattern.search(data).group()

and many other methods with the help of many examples in Stackflow (thank all!). But nothing seems to work.

The error I got is "AttributeError: 'NoneType' object has no attribute 'group'" which shows nothing is matched. I am quite new to Python (3 days...) and have weak background in JAVA so the REexp really confused me a lot. I wonder if anyone who can help me out with this?

Thank you so much!

P.S. If this is asked before, I am really sorry about this, first question on Stackflow after hours of searching for answers. PLZ help me.

Solution

Try

pattern=re.compile(r'case \S+ is\s*(.*?)\s*end case', re.DOTALL)
matches=pattern.findall(data)

print(matches)
# ['case body 1', 'case body 2']

Your first regex fails because () are special characters in regex that need to be escaped to match them literally.

Your second and third regex fail because a . doesn't match newlines by default.

The search method only returns the first match, so I used findall to get a list of all the matches.

Further explanation on request.