While trying to implement some deep magic that I'd rather not get into here (I should be able to figure it out if I get an answer for this), it occurred to me that __new__
doesn't work the same way for classes that define it, as for classes that don't. Specifically: when you define __new__
yourself, it will be passed arguments that mirror those of __init__
, but the default implementation doesn't accept any. This makes some sense, in that object
is a builtin type and doesn't need those arguments for itself.
However, it leads to the following behaviour, which I find quite vexatious:
>>> class example:
... def __init__(self, x): # a parameter other than `self` is necessary to reproduce
... pass
>>> example(1) # no problem, we can create instances.
<__main__.example object at 0x...>
>>> example.__new__ # it does exist:
<built-in method __new__ of type object at 0x...>
>>> old_new = example.__new__ # let's store it for later, and try something evil:
>>> example.__new__ = 'broken'
>>> example(1) # Okay, of course that will break it...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object is not callable
>>> example.__new__ = old_new # but we CAN'T FIX IT AGAIN
>>> example(1) # the argument isn't accepted any more:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object.__new__() takes exactly one argument (the type to instantiate)
>>> example() # But we can't omit it either due to __init__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __init__() missing 1 required positional argument: 'x'
Okay, but that's just because we still have something explicitly attached to example
, so it's shadowing the default, which breaks some descriptor thingy... right? Except not:
>>> del example.__new__ # if we get rid of it, the problem persists
>>> example(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object.__new__() takes exactly one argument (the type to instantiate)
>>> assert example.__new__ is old_new # even though the lookup gives us the same object!
The same thing still happens if we directly add and remove the attribute, without replacing it in between. Simply assigning and removing an attribute breaks the class, apparently irrevocably, and makes it impossible to instantiate. It's as if the class had some hidden attribute that tells it how to call __new__
, which has been silently corrupted.
When we instantiate example
at the start, how actually does Python find the base __new__
(it apparently finds object.__new__
, but is it looking directly in object
? Getting there indirectly via type
? Something else?), and how does it decide that this __new__
should be called without arguments, even though it would pass an argument if we wrote a __new__
method inside the class? Why does that logic break if we temporarily mess with the class' __new__
, even if we restore everything such that there is no observable net change?
The issues you're seeing aren't related to how Python finds __new__
or chooses its arguments. __new__
receives every argument you're passing. The effects you observed come from specific code in object.__new__
, combined with a bug in the logic for updating the C-level tp_new
slot.
There's nothing special about how Python passes arguments to __new__
. What's special is what object.__new__
does with those arguments.
object.__new__
and object.__init__
expect one argument, the class to instantiate for __new__
and the object to initialize for __init__
. If they receive any extra arguments, they will either ignore the extra arguments or throw an exception, depending on what methods have been overridden:
__new__
or __init__
, the non-overridden object
method should ignore extra arguments, so people aren't forced to override both.__new__
or __init__
explicitly passes extra arguments to object.__new__
or object.__init__
, the object
method should raise an exception.__new__
nor __init__
are overridden, both object
methods should throw an exception for extra arguments.There's a big comment in the source code talking about this.
At C level, __new__
and __init__
correspond to tp_new
and tp_init
function pointer slots in a class's memory layout. Under normal circumstances, if one of these methods is implemented in C, the slot will point directly to the C-level implementation, and a Python method object will be generated wrapping the C function. If the method is implemented in Python, the slot will point to the slot_tp_new
function, which searches the MRO for a __new__
method object and calls it. When instantiating an object, Python will invoke __new__
and __init__
by calling the tp_new
and tp_init
function pointers.
object.__new__
is implemented by the object_new
C-level function, and object.__init__
is implemented by object_init
. object
's tp_new
and tp_init
slots are set to point to these functions.
object_new
and object_init
check whether they're overridden by checking a class's tp_new
and tp_init
slots. If tp_new
points to something other than object_new
, then __new__
has been overridden, and similar for tp_init
and __init__
.
static PyObject *
object_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
if (excess_args(args, kwds)) {
if (type->tp_new != object_new) {
PyErr_SetString(PyExc_TypeError,
"object.__new__() takes exactly one argument (the type to instantiate)");
return NULL;
}
...
Now, when you assign or delete __new__
, Python has to update the tp_new
slot to reflect this. When you assign __new__
on a class, Python sets the class's tp_new
slot to the generic slot_tp_new
function, which searches for a __new__
method and calls it. When you delete __new__
, the class is supposed to re-inherit tp_new
from the superclass, but the code has a bug:
else if (Py_TYPE(descr) == &PyCFunction_Type &&
PyCFunction_GET_FUNCTION(descr) ==
(PyCFunction)(void(*)(void))tp_new_wrapper &&
ptr == (void**)&type->tp_new)
{
/* The __new__ wrapper is not a wrapper descriptor,
so must be special-cased differently.
If we don't do this, creating an instance will
always use slot_tp_new which will look up
__new__ in the MRO which will call tp_new_wrapper
which will look through the base classes looking
for a static base and call its tp_new (usually
PyType_GenericNew), after performing various
sanity checks and constructing a new argument
list. Cut all that nonsense short -- this speeds
up instance creation tremendously. */
specific = (void *)type->tp_new;
/* XXX I'm not 100% sure that there isn't a hole
in this reasoning that requires additional
sanity checks. I'll buy the first person to
point out a bug in this reasoning a beer. */
}
In the specific = (void *)type->tp_new;
line, type
is the wrong type - it's the class whose slot we're trying to update, not the class we're supposed to inherit tp_new
from.
When this code finds a __new__
method written in C, instead of updating tp_new
to point to the corresponding C function, it sets tp_new
to whatever value it already had! It doesn't change tp_new
at all!
So initially, your example
class has tp_new
set to object_new
, and object_new
ignores extra arguments because it sees that __init__
is overridden and __new__
isn't.
When you set example.__new__ = 'broken'
, Python sets example
's tp_new
to slot_tp_new
. Nothing you do after that point changes tp_new
to anything else, even though del example.__new__
really should have.
When object_new
finds that example
's tp_new
is slot_tp_new
instead of object_new
, it rejects extra arguments and throws an exception.
The bug manifests in some other ways too. For example,
>>> class Example: pass
...
>>> Example.__new__ = tuple.__new__
>>> Example()
<__main__.Example object at 0x7f9d0a38f400>
Before the __new__
assignment, Example
has tp_new
set to object_new
. When the example does Example.__new__ = tuple.__new__
, Python finds that tuple.__new__
is implemented in C, so it fails to update tp_new
, leaving it set to object_new
. Then, in Example(1, 2, 3)
, tuple.__new__
should raise an exception, because tuple.__new__
isn't applicable to Example
:
>>> tuple.__new__(Example)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: tuple.__new__(Example): Example is not a subtype of tuple
but because tp_new
is still set to object_new
, object_new
gets called instead of tuple.__new__
.
The devs have tried to fix the buggy code several times, but each fix was itself buggy and got reverted. The second attempt got closer, but broke multiple inheritance - see the conversation in the bug tracker.