Search code examples
pythonregexsplitpython-rerawbytestring

Split a string and keep the delimiters as part of the split string chunks, not as separate list elements


This is a spin-off from In Python, how do I split a string and keep the separators?

rawByteString = b'\\!\x00\x00\x00\x00\x00\x00\\!\x00\x00\x00\x00\x00\x00'

How can I split this rawByteString into parts using "\\!" as the delimiter without dropping the delimiters, so that I get:

[b'\\!\x00\x00\x00\x00\x00\x00', b'\\!\x00\x00\x00\x00\x00\x00']

I do not want to use [b'\\!' + x for x in rawByteString.split(b'\\!')][1:] as that would use string.split() and is just a workaround, that is why this question is tagged with the "re" module.


Solution

  • You may use

    re.split(rb'(?!\A)(?=\\!)', rawByteString)
    re.split(rb'(?!^)(?=\\!)', rawByteString)
    

    See a sample regex demo (the string input changed since null bytes cannot be part of a string).

    Regex details

    • (?!^) / (?!\A) / (?<!^) - a position other than start of string
    • (?=\\!) - a position not immediately followed with a backslash + !

    NOTES

    • Since you use a byte string, the b prefix is required when defining the pattern string literal
    • r makes the string literal a raw string literal so that we do not have to double escape backslashes and can use \\ to match a single \ in the string.

    See Python demo:

    import re
    rawByteString = b'\\!\x00\x00\x00\x00\x00\x00\\!\x00\x00\x00\x00\x00\x00'
    print ( re.split(rb'(?!\A)(?=\\!)', rawByteString) )
    

    Output:

    [b'\\!\x00\x00\x00\x00\x00\x00', b'\\!\x00\x00\x00\x00\x00\x00']