Search code examples
pythonregexstringoctal

Regex and Octal Characters


I'm attempting to write a regex that captures octal characters.

For example, if the line I'm comparing to my regex is:

char x = '\077';

I want my regex to capture '\077'

I attempted to do this via the re module and a regex of the form:

"'\\[0-7]{1-3}'"

But this doesn't capture the octal character. How can octal characters be identified using regex in Python?

Edit:

As an example of what I mean, consider the C code:

char x = '\077'; 
printf("%c", x);

I would like to capture '\077' from the first line.

Edit:

After testing some of the suggestions in this thread, I have a case that works. I realize that after adding the octal regex to a larger regex, I needed to prefix with r for raw input, or escape each backslash, for a total of four backslashes.

For example, both of these solve the problem:

regex = re.compile(r"\s*("                  
                        r"'\\0[0-7]{1,2}'"          # octal
                        "|[a-zA-Z_][a-zA-Z_\d]*"    # identifer
                        ")")
regex.findall(line)

and

regex = re.compile(r"\s*("                  
                        "'\\\\0[0-7]{1,2}'"         # octal
                        "|[a-zA-Z_][a-zA-Z_\d]*"    # identifer
                        ")")
regex.findall(line)

Which will produce '\077' if line is: char = '\077';

Thanks for the help everyone.


Solution

  • You need to define your input as raw string:

    >>> str = r"char x = '\077'; \nprintf(\"%c\", x);"
    

    Prefix r is for defining a raw string.

    Then use:

    >>> print re.findall(ur"'\\[0-7]{1,3}'", str)
    ["'\\077'"]
    

    RegEx Demo


    Code to read text from stdin and apply regex:

    #!/usr/bin/python
    import sys
    import re
    
    str = sys.stdin.read()
    print re.findall(ur"'\\[0-7]{1,3}'", str)