I'm attempting to write a regex that captures octal characters.
For example, if the line I'm comparing to my regex is:
char x = '\077';
I want my regex to capture '\077'
I attempted to do this via the re module and a regex of the form:
"'\\[0-7]{1-3}'"
But this doesn't capture the octal character. How can octal characters be identified using regex in Python?
Edit:
As an example of what I mean, consider the C code:
char x = '\077';
printf("%c", x);
I would like to capture '\077'
from the first line.
Edit:
After testing some of the suggestions in this thread, I have a case that works. I realize that after adding the octal regex to a larger regex, I needed to prefix with r for raw input, or escape each backslash, for a total of four backslashes.
For example, both of these solve the problem:
regex = re.compile(r"\s*("
r"'\\0[0-7]{1,2}'" # octal
"|[a-zA-Z_][a-zA-Z_\d]*" # identifer
")")
regex.findall(line)
and
regex = re.compile(r"\s*("
"'\\\\0[0-7]{1,2}'" # octal
"|[a-zA-Z_][a-zA-Z_\d]*" # identifer
")")
regex.findall(line)
Which will produce '\077'
if line is: char = '\077';
Thanks for the help everyone.
You need to define your input as raw string:
>>> str = r"char x = '\077'; \nprintf(\"%c\", x);"
Prefix r
is for defining a raw string.
Then use:
>>> print re.findall(ur"'\\[0-7]{1,3}'", str)
["'\\077'"]
Code to read text from stdin and apply regex:
#!/usr/bin/python
import sys
import re
str = sys.stdin.read()
print re.findall(ur"'\\[0-7]{1,3}'", str)