Search code examples
pythonlistcpythonpython-internals

how to find the implementation of [::-1] ( reversing list in python ) in CPython source code


I was trying to reverse a list in python. There are many methods out there and [::-1] seems to be a great one! But I'm curious how [::-1] is done? What's the time complexity of it?

I searched through CPython repo in github, but I couldn't find any clue. My strategy of searching was that I searched the key word: :: in CPython repo. Since there is :: in the python syntax, i.e. [::-1], there may be :: in the CPython source code so that [::-1] can refer to it and does the reversing. Does this make sense?


Solution

  • list[...] is using indexing, or more accurate subscription syntax, and [start:stop:step] is a slicing. Python 3 passes a slice() object to the subscription call for such cases, there is no :: in the CPython source code as that's not how the language parser would threat that part. Slicing notation allows for defaults, an empty slot between in the start:stop:step parts means that a default is picked, and list[::-1] leaves start and stop at defaults (represented by None, and the step value is set to -1.

    All this just means you need to remember to separate syntax from operations. You could analyse the syntax by parsing it into an abstract syntax tree, or by disassembling the bytecode generated for it. When you generate an abstract syntax tree for list[::-1] you'll see that the parser separates out the slice part:

    >>> import ast
    >>> ast.dump(ast.parse('list[::-1]', '', 'eval').body)  # expression body only
    "Subscript(value=Name(id='list', ctx=Load()), slice=Slice(lower=None, upper=None, step=UnaryOp(op=USub(), operand=Num(n=1))), ctx=Load())"
    

    So the Subscript() syntax node operates on the name list, passing in a slice.

    Producing a disassembly for this operation shows the bytecode used:

    >>> dis.dis('list[::-1]')
      1           0 LOAD_NAME                0 (list)
                  2 LOAD_CONST               0 (None)
                  4 LOAD_CONST               0 (None)
                  6 LOAD_CONST               1 (-1)
                  8 BUILD_SLICE              3
                 10 BINARY_SUBSCR
                 12 RETURN_VALUE
    

    Here the BUILD_SLICE() bytecode operation takes the top-three elements from the stack (put there by LOAD_CONST operations at indices 2, 4 and 6) to build a slice() object, which is then passed to the list object (next on the stack) with BINARY_SUBSCR. That last bytecode is the Subscript() operation in the AST, and the bytecodes 2-8 are the Slice() object in the AST.

    With the bytecode in hand, you can then head over to the Python bytecode evaluation loop to see what those bytecodes actually do. I'll skip BUILD_SLICE, that's just creating a simple slice() object to hold the start, stop and step values.

    Focusing on the BINARY_SUBSCR opcode, we find:

    TARGET(BINARY_SUBSCR) {
        PyObject *sub = POP();
        PyObject *container = TOP();
        PyObject *res = PyObject_GetItem(container, sub);
        Py_DECREF(container);
        Py_DECREF(sub);
        SET_TOP(res);
        if (res == NULL)
            goto error;
        DISPATCH();
    }
    

    So Python takes the top of the stack and puts that in sub (that's the slice() object here), and puts the list reference into container, and then calls PyObject_GetItem(container, sub). That's it. Next step then is to find PyObject_GetItem(). That's defined in abstract.c, and the first thing it does is see if the object is a mapping type:

    m = o->ob_type->tp_as_mapping;
    if (m && m->mp_subscript) {
        PyObject *item = m->mp_subscript(o, key);
        assert((item != NULL) ^ (PyErr_Occurred() != NULL));
        return item;
    }
    

    ->ob_type is the object class (type(object) does the same), and ->tp_as_mapping is set when the mapping protocol is supported. If the mapping protocol is supported then m->mp_subscript(o, key); is called with o being the list object, and m being the list mapping support definition.

    Lists support this protocol, because only this protocol supports slice objects (it implements the sequence protocol as well, but that only accepts integers and a much older, more limited form of slicing with just two indices). We can see that list implements the protocol by looking for a tp_as_mapping entry in the PyTypeObject PyList_Type type definition:

    PyTypeObject PyList_Type = {
        PyVarObject_HEAD_INIT(&PyType_Type, 0)
        "list",
        sizeof(PyListObject),
        0,
        // ...
        &list_as_mapping, /* tp_as_mapping */
        // ...
    };
    

    which points to list_as_mapping which is defined as

    static PyMappingMethods list_as_mapping = {
        (lenfunc)list_length,
        (binaryfunc)list_subscript,
        (objobjargproc)list_ass_subscript
    };
    

    So now we know where to find the actual implementation for the indexing operation; that implementation is handled by the list_subscript() function, which uses PySlice_Check() to check for slice() objects, and then just creates a new list object and copies the indices over:

    if (slicelength <= 0) {
        return PyList_New(0);
    }
    else if (step == 1) {
        return list_slice(self, start, stop);
    }
    else {
        result = list_new_prealloc(slicelength);
        if (!result) return NULL;
    
    
        src = self->ob_item;
        dest = ((PyListObject *)result)->ob_item;
        for (cur = start, i = 0; i < slicelength;
             cur += (size_t)step, i++) {
            it = src[cur];
            Py_INCREF(it);
            dest[i] = it;
        }
        Py_SIZE(result) = slicelength;
        return result;
    }
    

    For a non-empty slice, with a step size of -1, the else branch is taken, where list_new_prealloc() creates a new, pre-allocated list to accommodate all slice indices, then a for loop that uses cur += (size_t)step is used to generate the index to copy into the new list.

    For your list[::-1] slice, the list object is passed in a slice(None, None, -1) object, and list_subscript uses the PySlice_Unpack() and PySlice_AdjustIndices() functions to turn that into concrete slicelength, start, stop and step values; for a negative step and start and stop both set to None the end result is that slicelength is set to len(list), start is set to len(list) - 1, and stop to -1.

    Thus, the effect on the above for loop is that all elements, starting at index len(list) - 1 and iterating through to index 0, are copied in to the new list object.

    So reversing with list[::-1] is a O(N) operation, for length N of the list (this includes pre-allocation, which takes up to O(N) time to allocate memory for the list references of the new list).