Search code examples
pythonarraysregexparsingjpeg

Parsing JPEG bytestream markers with regex


I am working on writing a python program to manipulate information in a JPEG image. I am however having trouble getting my regular expression to look for the byte marker codes used in JPEG images.

For example: the start of image marker is \xFF\xD8 and the end of image marker is \xFF\xD9. The pattern I tried using was: rb'\xFF\xD8(.+?)\xFF\xD9'. No success there. What should my pattern be if I want to find everything in between specific byte markers in a byte array?


Solution

  • The 'r' prefix on a string says "ignore backslash escapes" so it's ignoring them.

    This means you are looking for a string with the first four characters literally r'\', 'x', 'F', and 'F'

    Remove the r and double any backslashes that are part of the regular expression rather than escapes to represent binary characters.