Search code examples
pythonarrayspython-internalsabc

Why is bytearray not a Sequence in Python 2?


I'm seeing a weird discrepancy in behavior between Python 2 and 3.

In Python 3 things seem to work fine:

Python 3.5.0rc2 (v3.5.0rc2:cc15d736d860, Aug 25 2015, 04:45:41) [MSC v.1900 32 b
it (Intel)] on win32
>>> from collections import Sequence
>>> isinstance(bytearray(b"56"), Sequence)
True

But not in Python 2:

Python 2.7.10 (default, May 23 2015, 09:44:00) [MSC v.1500 64 bit (AMD64)] on wi
n32
>>> from collections import Sequence
>>> isinstance(bytearray("56"), Sequence)
False

The results seem to be consistent across minor releases of both Python 2.x and 3.x. Is this a known bug? Is it a bug at all? Is there any logic behind this difference?

I am actually more worried about the C API function PySequence_Check properly identifying an object of type PyByteArray_Type as exposing the sequence protocol, which by looking at the source code it seems like it should, but any insight into this whole thing is very welcome.


Solution

  • Abstract classes from collections use ABCMeta.register(subclass) to

    Register subclass as a “virtual subclass” of this ABC.

    In Python 3 issubclass(bytearray, Sequence) returns True because bytearray is explicitly registered as a subclass of ByteString (which is derived from Sequence) and MutableSequence. See the relevant part of Lib/_collections_abc.py:

    class ByteString(Sequence):
    
        """This unifies bytes and bytearray.
    
        XXX Should add all their methods.
        """
    
        __slots__ = ()
    
    ByteString.register(bytes)
    ByteString.register(bytearray)
    ...
    MutableSequence.register(bytearray)  # Multiply inheriting, see ByteString
    

    Python 2 doesn't do that (from Lib/_abcoll.py):

    Sequence.register(tuple)
    Sequence.register(basestring)
    Sequence.register(buffer)
    Sequence.register(xrange)
    ...
    MutableSequence.register(list)
    

    This behaviour was changed in Python 3.0 (in this commit specifically):

    Add ABC ByteString which unifies bytes and bytearray (but not memoryview). There's no ABC for "PEP 3118 style buffer API objects" because there's no way to recognize these in Python (apart from trying to use memoryview() on them).

    And there's more information in PEP 3119:

    This is a proposal to add Abstract Base Class (ABC) support to Python 3000. It proposes: [...] Specific ABCs for containers and iterators, to be added to the collections module.

    Much of the thinking that went into the proposal is not about the specific mechanism of ABCs, as contrasted with Interfaces or Generic Functions (GFs), but about clarifying philosophical issues like "what makes a set", "what makes a mapping" and "what makes a sequence".

    [...] a metaclass for use with ABCs that will allow us to add an ABC as a "virtual base class" (not the same concept as in C++) to any class, including to another ABC. This allows the standard library to define ABCs Sequence and MutableSequence and register these as virtual base classes for built-in types like basestring, tuple and list, so that for example the following conditions are all true: [...] issubclass(bytearray, MutableSequence).

    Just FYI memoryview was registered as a subclass of Sequence only in Python 3.4:

    There's no ducktyping for this due to the Sequence/Mapping confusion so it's a simple missing explicit registration.

    (see issue18690 for details).


    PySequence_Check from Python C API does not rely on the collections module:

    int
    PySequence_Check(PyObject *s)
    {
        if (PyDict_Check(s))
            return 0;
        return s != NULL && s->ob_type->tp_as_sequence &&
            s->ob_type->tp_as_sequence->sq_item != NULL;
    }
    

    It checks for non-zero tp_as_sequence field (example for bytearray) and if that succeeds, for non-zero sq_item field (which is basically getitem - example for bytearray).