Search code examples
regexregex-groupmutual-exclusion

RegEx exclude sets while grouping all characters 2 by 2


I want to modify a binary file with a pattern. I've converted the file to a plain hexdump with xxd (from package vim). The plain file looks like this (only 1 line with no trailing LF):

$ xxd -ps file.bin | tr -d '\n' | tee out.txt
3a0a5354...

I want to remove all patterns that match \x01[^\xFF]*\xFF (an opening token and a closing token and everything between them except another closing token) in the original file, but sed doesn't work like this.

Example Input and Desired Match:

020202020101010101feeffeefff0000...
        ~~~~~~~~~~~~~~~~~~~~    

And I'm thinking about doing this:

sed 's/regex//g' in.file > out.file

Now I'm trying to match all chatacters 2-by-2 while excluding ff. Any ideas?


Solution

  • This should do the trick:

    ((..)|01([0-9a-e][0-9a-f]|[0-9a-f][0-9a-e])*ff)*

    That is, we match pairs of hexadecimal digits where either the first or the second digit can be f but not both. In the surrounding context we must also match everything two characters at a time to ensure that our matches start from an even digit.

    Obviously, you must add something that actually removes the inner group from the output, which is specific to your regex engine. I realized only after posting this that a simple s/ won't do.