I have a VHDL file which contains some paragraph I want to extract. Generally, it looks like this:
Declaration 1.
Some codes.
(Following are paragraphs I want to extract)
case (state) is
case body 1
end case;
Declaration 2.
Some codes.
(Following are paragraphs I want to extract)
case (state) is
case body 2
end case;
So the "case body 1" and "case body 2" are what I want. "case (state) is" and "end case;" can be matched along or not, it does not matter. I have tried some methods like:
f1=open('/home/liuduo/Desktop/f2.vhd')
data=f1.read()
pattern=re.compile('case (state) is[\s\S]*?end case;')
reg=pattern.search(data).group()
or
pattern=re.compile('(?<=\bcase\b).*?(?=\bend\b)')
reg=pattern.search(data).group()
or
pattern=re.compile('.*?case(.*?)end.*?')
reg=pattern.search(data).group()
and many other methods with the help of many examples in Stackflow (thank all!). But nothing seems to work.
The error I got is "AttributeError: 'NoneType' object has no attribute 'group'" which shows nothing is matched. I am quite new to Python (3 days...) and have weak background in JAVA so the REexp really confused me a lot. I wonder if anyone who can help me out with this?
Thank you so much!
P.S. If this is asked before, I am really sorry about this, first question on Stackflow after hours of searching for answers. PLZ help me.
Try
pattern=re.compile(r'case \S+ is\s*(.*?)\s*end case', re.DOTALL)
matches=pattern.findall(data)
print(matches)
# ['case body 1', 'case body 2']
Your first regex fails because ()
are special characters in regex that need to be escaped to match them literally.
Your second and third regex fail because a .
doesn't match newlines by default.
The search
method only returns the first match, so I used findall
to get a list of all the matches.
Further explanation on request.