I'm a beginner using python. I want to create a regular expression to capture error messages from compiler output in python. How would I do this?
for example, if the compiler output is the following error message:
Traceback (most recent call last):
File "sample.py", line 1, in <module>
hello
NameError: name 'hello' is not defined
I want to be able to only extract only the following string from the output:
NameError: name 'hello' is not defined
In this case there is only one error, however I want to extract all the errors the compiler outputs. How do I do this using regular expressions? Or if there is an easier way, I'm open to suggestions
r'Traceback \(most recent call last\):\n(?:[ ]+.*\n)*(\w+: .*)'
should extract your exception; a traceback contains lines that all start with whitespace except for the exception line.
The above matches the literal text of the traceback first line, 0 or more lines that start with at least one space, and then captures the line following that provided it starts with 1 or more word characters (which fits Python identifiers nicely), a colon, and then the rest up to the end of a line.
Demo:
>>> import re
>>> sample = '''\
... Traceback (most recent call last):
... File "sample.py", line 1, in <module>
... hello
... NameError: name 'hello' is not defined
... '''
>>> re.search(r'Traceback \(most recent call last\):\n(?:[ ]+.*\n)*(\w+: .*)', sample).groups()
("NameError: name 'hello' is not defined",)