Search code examples
pythonregexbinaryfiles

re and binary file, python strange behavior


i have a strange behavior of re.search on a binary file. Here is my python screenshot :

enter image description here

As you can see, i have two problems :

  • re can't find a pattern which is indeed in the file
  • include "\x5b" in an expression results in an error

Any idea?


Solution

  • \x5b is the ASCII [ character, the left square bracket. That's a regex meta character forming the start of a [...] character class specification and needs to be escaped if you want to match a literal [ character:

    >>> import re
    >>> re.search('[', '')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/mjpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/re.py", line 146, in search
        return _compile(pattern, flags).search(string)
      File "/Users/mjpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/re.py", line 251, in _compile
        raise error, v # invalid expression
    sre_constants.error: unexpected end of regular expression
    >>> re.search('\[', '')
    

    The same applies to \x41, that's the ^ character, which in a regex context matches the start of the string only, not the literal character ^. Since you tried to match data before the ^ point the regex can't match anything, simply because that makes the anchor invalid.

    If you are only searching for literal text matches, don't use a regex. You could just use str.find() or str.index() to get the index of matched text.

    If you are using this in a larger expression and generate the expression from data, then use re.escape() to ensure all metacharacters are properly escaped first.