Search code examples
pythonabstract-classabc

Why is str not a subclass of collections.abc.ByteString?


When having a look at the types str and bytes in Python, it turns out they are very similiar. The only differences wrt. their attributes are:

>>> set(dir(bytes)) - set(dir(str))
{'hex', 'fromhex', 'decode'}
>>> set(dir(str)) - set(dir(bytes))
{'isidentifier', 'encode', 'isdecimal', 'isnumeric', 'casefold', 'format', 'isprintable', 'format_map'}

Checking the Python documentation, I figured that these differences should not be relevant for their relation to the abstract base class collections.abc.ByteString. However, bytes is regarded a subclass while str is not:

>>> issubclass(bytes, collections.abc.ByteString)
True
>>> issubclass(str, collections.abc.ByteString)
False

While the observed behaviour is useful to discern these types, I do not understand why Python behaves that way. In my understanding of Python's duck typing concept, both str and bytes should be regarded as subclasses, as long as they bring the relevant attributes.


Solution

  • A str isn't a string of bytes. ByteString's meaning isn't encompassed by its methods, and str does not fit the meaning of ByteString. (The ABC mostly exists as a way to bundle bytes and bytearray for isinstance checks, hence the "This unifies bytes and bytearray." in its docstring.)

    You might wonder why issubclass doesn't automatically consider str a ByteString subclass anyway based on its methods. Unless an ABC specifically implements __subclasshook__ to check for methods, issubclass will not automatically consider a class a subclass of an ABC based on the presence of any particular methods. bytes and bytearray are subclasses of ByteString because they are specifically registered as subclasses.