Search code examples
regexhexpcre

Is it possible to split a hex string like this using just regex?


Given the following input string (with no whitespaces in it):

000102030405060708090A0B0C0D0E0F10111213...

Is it possible to write a substitution regex that would produce the following output?

\t0x00, 0x01, 0x02, ... 0x0F\n
\t0x10, 0x11, 0x12, ...

Basically, split the hex string into groups of 16 bytes, comma separate individual 16 bytes, add 0x prefix to each, and insert tab and newline characters on 16-byte boundary.

I can match two hex characters with [[:xdigit:]]{2}, which allows me to substitute using 0x$1, to get individual bytes formatted the way I want, but I can't think of a way of doing the newline and tab insertion after each 16 bytes.

I guess that "Not possible using regex" will be an acceptable answer to the question (if backed with a link to docs that explain why) — what won't be accepted are suggestions to use programming languages to format the string because I already know how to do that.


Solution

  • If you are using a flavour of PCRE that supports conditional substitution, you could do this generically by matching:

    (..)(..)?(..)?(..)?(..)?(..)?(..)?(..)?(..)?(..)?(..)?(..)?(..)?(..)?(..)?(..)?
    

    i.e. a pair of characters followed by 15 optional pairs of characters, and then replacing it with

    \t0x$1${2:+, 0x$2:}${3:+, 0x$3:}${4:+, 0x$4:}${5:+, 0x$5:}${6:+, 0x$6:}${7:+, 0x$7:}${8:+, 0x$8:}${9:+, 0x$9:}${10:+, 0x$10:}${11:+, 0x$11:}${12:+, 0x$12:}${13:+, 0x$13:}${14:+, 0x$14:}${15:+, 0x$15:}${16:+, 0x$16:}\n
    

    The conditional substitutions add , 0x in front of any of the optional groups that match.

    For this input:

    000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F202122
    

    The output will be:

        0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F
        0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1A, 0x1B, 0x1C, 0x1D, 0x1E, 0x1F
        0x20, 0x21, 0x22
    

    Demo on regex101

    Note that if you know you always have a multiple of 32 digits, you can simplify the regex to:

    (..)(..)(..)(..)(..)(..)(..)(..)(..)(..)(..)(..)(..)(..)(..)(..)
    

    and the substitution to:

    \t0x$1, 0x$2, 0x$3, 0x$4, 0x$5, 0x$6, 0x$7, 0x$8, 0x$9, 0x$10, 0x$11, 0x$12, 0x$13, 0x$14, 0x$15, 0x$16\n
    

    Demo on regex101

    Note I've assumed you're trying to replace an entire string of hex digits; if that is not the case you need to replace .. (match any pair of characters) with [0-9a-f]{2} (and add the i flag and possibly word breaks \b at each end of the regex) to only match the hex digits within the string.

    Demo on regex101