Why does base64 encode return bytes instead of string directly?

import base64
base64.b64encode(b'bytes required')

>>>b'Ynl0ZXMgcmVxdWlyZWQ='

If I understand correctly, base64 is a bytes <----> string notation. Then why doesn't it give me string 'Ynl0ZXMgcmVxdWlyZWQ=' directly?

Or does it expect me to do some decoding furthermore?

Like b'Ynl0ZXMgcmVxdWlyZWQ='.decode('ascii') or
b'Ynl0ZXMgcmVxdWlyZWQ='.decode('utf-8') ? But they result in the same thing.

Solution

You are correct that base64 is meant to be a textual representation of binary data.

However, you are neglecting constraints on the actual implementation side of things.

import sys

>>> sys.getsizeof("Hello World")
60
>>> sys.getsizeof("Hello World".encode("utf-8"))
44

str objects simply take up more system resources than bytes. This overhead can lead to non-trivial degradation in performance when working with larger bodies of base64 encoded data.

I also suspect that since the original python module was ported from python2.7 (which did not distinguish between str and bytes), that this might also just be a legacy part of the language.