Search code examples
pythonpython-3.xxor

fast XORing bytes in python 3


I need to xor 2 bytes objects. I use this code:

def bxor(b1, b2): # use xor for bytes
    result = b""
    for b1, b2 in zip(b1, b2):
        result += bytes([b1 ^ b2])
    return result

It works fine when the bytes objects are small, but if I xor big objects (a few MB) it takes very long time (a few hours). How can I make it faster?


Solution

  • When XORing bytes objects with one million elements each, this loop creates roughly one million temporary bytes objects and copies each byte, on average, roughly 500 thousand times from one temporary bytes to the next. Note that the exact same problem exists for strings (in many other languages, too). The string solution is to create a list of string parts and use ''.join at the end to concatenate them efficiently. You can do the same thing with bytes:

    def bxor(b1, b2): # use xor for bytes
        parts = []
        for b1, b2 in zip(b1, b2):
            parts.append(bytes([b1 ^ b2]))
        return b''.join(parts)
    

    Alternatively, you can use a bytearray which is mutable and can therefore avoid the problem. It also allows you to not allocate a new bytes object on every iteration, you can just append the byte/int.

    def bxor(b1, b2): # use xor for bytes
        result = bytearray()
        for b1, b2 in zip(b1, b2):
            result.append(b1 ^ b2)
        return result
    

    You can alternatively return bytes(result) if you want/need a bytes object.