Search code examples
pythonbase64

Why does base64 encode return bytes instead of string directly?


import base64
base64.b64encode(b'bytes required')

>>>b'Ynl0ZXMgcmVxdWlyZWQ='

If I understand correctly, base64 is a bytes <----> string notation. Then why doesn't it give me string 'Ynl0ZXMgcmVxdWlyZWQ=' directly?

Or does it expect me to do some decoding furthermore?

Like b'Ynl0ZXMgcmVxdWlyZWQ='.decode('ascii') or
b'Ynl0ZXMgcmVxdWlyZWQ='.decode('utf-8') ? But they result in the same thing.


Solution

  • You are correct that base64 is meant to be a textual representation of binary data.

    However, you are neglecting constraints on the actual implementation side of things.

    import sys
    
    >>> sys.getsizeof("Hello World")
    60
    >>> sys.getsizeof("Hello World".encode("utf-8"))
    44
    

    str objects simply take up more system resources than bytes. This overhead can lead to non-trivial degradation in performance when working with larger bodies of base64 encoded data.

    I also suspect that since the original python module was ported from python2.7 (which did not distinguish between str and bytes), that this might also just be a legacy part of the language.