I was trying to reverse a list in python.
There are many methods out there and [::-1]
seems to be a great one!
But I'm curious how [::-1]
is done? What's the time complexity of it?
I searched through CPython repo in github, but I couldn't find any clue.
My strategy of searching was that I searched the key word: ::
in CPython repo.
Since there is ::
in the python syntax, i.e. [::-1]
, there may be ::
in the CPython source code so that [::-1]
can refer to it and does the reversing. Does this make sense?
list[...]
is using indexing, or more accurate subscription syntax, and [start:stop:step]
is a slicing. Python 3 passes a slice()
object to the subscription call for such cases, there is no ::
in the CPython source code as that's not how the language parser would threat that part. Slicing notation allows for defaults, an empty slot between in the start:stop:step
parts means that a default is picked, and list[::-1]
leaves start
and stop
at defaults (represented by None
, and the step value is set to -1
.
All this just means you need to remember to separate syntax from operations. You could analyse the syntax by parsing it into an abstract syntax tree, or by disassembling the bytecode generated for it. When you generate an abstract syntax tree for list[::-1]
you'll see that the parser separates out the slice part:
>>> import ast
>>> ast.dump(ast.parse('list[::-1]', '', 'eval').body) # expression body only
"Subscript(value=Name(id='list', ctx=Load()), slice=Slice(lower=None, upper=None, step=UnaryOp(op=USub(), operand=Num(n=1))), ctx=Load())"
So the Subscript()
syntax node operates on the name list
, passing in a slice.
Producing a disassembly for this operation shows the bytecode used:
>>> dis.dis('list[::-1]')
1 0 LOAD_NAME 0 (list)
2 LOAD_CONST 0 (None)
4 LOAD_CONST 0 (None)
6 LOAD_CONST 1 (-1)
8 BUILD_SLICE 3
10 BINARY_SUBSCR
12 RETURN_VALUE
Here the BUILD_SLICE()
bytecode operation takes the top-three elements from the stack (put there by LOAD_CONST
operations at indices 2, 4 and 6) to build a slice()
object, which is then passed to the list
object (next on the stack) with BINARY_SUBSCR
. That last bytecode is the Subscript()
operation in the AST, and the bytecodes 2-8 are the Slice()
object in the AST.
With the bytecode in hand, you can then head over to the Python bytecode evaluation loop to see what those bytecodes actually do. I'll skip BUILD_SLICE
, that's just creating a simple slice()
object to hold the start
, stop
and step
values.
Focusing on the BINARY_SUBSCR
opcode, we find:
TARGET(BINARY_SUBSCR) {
PyObject *sub = POP();
PyObject *container = TOP();
PyObject *res = PyObject_GetItem(container, sub);
Py_DECREF(container);
Py_DECREF(sub);
SET_TOP(res);
if (res == NULL)
goto error;
DISPATCH();
}
So Python takes the top of the stack and puts that in sub
(that's the slice()
object here), and puts the list
reference into container
, and then calls PyObject_GetItem(container, sub)
. That's it. Next step then is to find PyObject_GetItem()
. That's defined in abstract.c
, and the first thing it does is see if the object is a mapping type:
m = o->ob_type->tp_as_mapping;
if (m && m->mp_subscript) {
PyObject *item = m->mp_subscript(o, key);
assert((item != NULL) ^ (PyErr_Occurred() != NULL));
return item;
}
->ob_type
is the object class (type(object)
does the same), and ->tp_as_mapping
is set when the mapping protocol is supported. If the mapping protocol is supported then m->mp_subscript(o, key);
is called with o
being the list object, and m
being the list mapping support definition.
Lists support this protocol, because only this protocol supports slice objects (it implements the sequence protocol as well, but that only accepts integers and a much older, more limited form of slicing with just two indices). We can see that list
implements the protocol by looking for a tp_as_mapping
entry in the PyTypeObject PyList_Type
type definition:
PyTypeObject PyList_Type = {
PyVarObject_HEAD_INIT(&PyType_Type, 0)
"list",
sizeof(PyListObject),
0,
// ...
&list_as_mapping, /* tp_as_mapping */
// ...
};
which points to list_as_mapping
which is defined as
static PyMappingMethods list_as_mapping = {
(lenfunc)list_length,
(binaryfunc)list_subscript,
(objobjargproc)list_ass_subscript
};
So now we know where to find the actual implementation for the indexing operation; that implementation is handled by the list_subscript()
function, which uses PySlice_Check()
to check for slice()
objects, and then just creates a new list object and copies the indices over:
if (slicelength <= 0) {
return PyList_New(0);
}
else if (step == 1) {
return list_slice(self, start, stop);
}
else {
result = list_new_prealloc(slicelength);
if (!result) return NULL;
src = self->ob_item;
dest = ((PyListObject *)result)->ob_item;
for (cur = start, i = 0; i < slicelength;
cur += (size_t)step, i++) {
it = src[cur];
Py_INCREF(it);
dest[i] = it;
}
Py_SIZE(result) = slicelength;
return result;
}
For a non-empty slice, with a step size of -1
, the else
branch is taken, where list_new_prealloc()
creates a new, pre-allocated list to accommodate all slice indices, then a for
loop that uses cur += (size_t)step
is used to generate the index to copy into the new list.
For your list[::-1]
slice, the list
object is passed in a slice(None, None, -1)
object, and list_subscript
uses the PySlice_Unpack()
and PySlice_AdjustIndices()
functions to turn that into concrete slicelength
, start
, stop
and step
values; for a negative step
and start
and stop
both set to None
the end result is that slicelength
is set to len(list)
, start
is set to len(list) - 1
, and stop
to -1
.
Thus, the effect on the above for
loop is that all elements, starting at index len(list) - 1
and iterating through to index 0
, are copied in to the new list object.
So reversing with list[::-1]
is a O(N) operation, for length N of the list (this includes pre-allocation, which takes up to O(N) time to allocate memory for the list references of the new list).