Search code examples
pythonregexpython-3.9

Have regex skip a match if it occurs within 1024 characters


I have the following regex.replace:

self.reply = raw_reply.replace(b"<" + rcid + b":", b"") 

where rcid is a command reference. raw_reply is a huge mass of data in bytes e.g.

<35:\x07\x98c\x45\x09 etc. 

I want it to remove all instances of, for example <35: but only if one has not been replaced less than 1024 characters ago.

Is there a way to do this with regex?

I've tried looking at exclusions and negative lookahead but not sure how to implement it when i want it to ignore any matches within 1024 characters of the previous match.


Solution

  • Use a regular expression that matches up to 1024 characters after the pattern you're replacing. Capture the excess 1024 characters in a capture group so you can copy them to the replacement. The next match will have to be after this, since overlapping matches are not processed.

    self.reply = re.sub(b"<" + rcid + b":(.{,1024})", br"\1", raw_reply, flags=re.DOTALL)