c++initialization python-c-api new-style-class pycxx

How to tidy/fix PyCXX's creation of new-style Python extension-class?

I've nearly finished rewriting a C++ Python wrapper (PyCXX).

The original allows old and new style extension classes, but also allows one to derive from the new-style classes:

import test

// ok
a = test.new_style_class();

// also ok
class Derived( test.new_style_class() ):
    def __init__( self ):
        test_funcmapper.new_style_class.__init__( self )

    def derived_func( self ):
        print( 'derived_func' )
        super().func_noargs()

    def func_noargs( self ):
        print( 'derived func_noargs' )

d = Derived()

The code is convoluted, and appears to contain errors (Why does PyCXX handle new-style classes in the way it does?)

My question is: What is the rationale/justification for PyCXX's convoluted mechanism? Is there a cleaner alternative?

I will attempt to detail below where I am at with this enquiry. First I will try and describe what PyCXX is doing at the moment, then I will describe what I think could maybe be improved.

When the Python runtime encounters d = Derived(), it does PyObject_Call( ob ) where ob is thePyTypeObjectforNewStyleClass. I will writeobasNewStyleClass_PyTypeObject`.

That PyTypeObject has been constructed in C++ and registered using PyType_Ready

PyObject_Call will invoke type_call(PyTypeObject *type, PyObject *args, PyObject *kwds), returning an initialised Derived instance i.e.

PyObject* derived_instance = type_call(NewStyleClass_PyTypeObject, NULL, NULL)

Something like this.

(All of this coming from (http://eli.thegreenplace.net/2012/04/16/python-object-creation-sequence by the way, thanks Eli!)

type_call does essentially:

type->tp_new(type, args, kwds);
type->tp_init(obj, args, kwds);

And our C++ wrapper has inserted functions into the tp_new and tp_init slots of NewStyleClass_PyTypeObject something like this:

typeobject.set_tp_new( extension_object_new );
typeobject.set_tp_init( extension_object_init );

:
    static PyObject* extension_object_new( PyTypeObject* subtype, 
                                              PyObject* args, PyObject* kwds )
    {
        PyObject* pyob = subtype->tp_alloc(subtype,0);

        Bridge* o = reinterpret_cast<Bridge *>( pyob );

        o->m_pycxx_object = nullptr;

        return pyob;
    }

    static int extension_object_init( PyObject* _self, 
                                            PyObject* args, PyObject* kwds )
    {
        Bridge* self{ reinterpret_cast<Bridge*>(_self) };

        // NOTE: observe this is where we invoke the constructor, 
        //       but indirectly (i.e. through final)
        self->m_pycxx_object = new FinalClass{ self, args, kwds };

        return 0;
    }

Note that we need to bind together the Python Derived instance, and it's corresponding C++ class instance. (Why? Explained below, see 'X'). To do that we are using:

struct Bridge
{
    PyObject_HEAD // <-- a PyObject
    ExtObjBase* m_pycxx_object;
}

Now this bridge raises a question. I'm very suspicious of this design.

Note how memory was allocated for this new PyObject:

        PyObject* pyob = subtype->tp_alloc(subtype,0);

And then we typecast this pointer to Bridge, and use the 4 or 8 (sizeof(void*)) bytes immediately following the PyObject to point to the corresponding C++ class instance (this gets hooked up in extension_object_init as can be seen above).

Now for this to work we require:

a) subtype->tp_alloc(subtype,0) must be allocating an extra sizeof(void*) bytes b) The PyObject doesn't require any memory beyond sizeof(PyObject_HEAD), because if it did then this would be conflicting with the above pointer

One major question I have at this point is: Can we guarantee that the PyObject that the Python runtime has created for our derived_instance does not overlap into Bridge's ExtObjBase* m_pycxx_object field?

I will attempt to answer it: it is US determining how much memory gets allocated. When we create NewStyleClass_PyTypeObject we feed in how much memory we want this PyTypeObject to allocate for a new instance of this type:

template< TEMPLATE_TYPENAME FinalClass >
class ExtObjBase : public FuncMapper<FinalClass> , public ExtObjBase_noTemplate
{
protected:
    static TypeObject& typeobject()
    {
        static TypeObject* t{ nullptr };
        if( ! t )
            t = new TypeObject{ sizeof(FinalClass), typeid(FinalClass).name() };
                   /*           ^^^^^^^^^^^^^^^^^ this is the bug BTW!
                        The C++ Derived class instance never gets deposited
                        In the memory allocated by the Python runtime
                        (controlled by this parameter)

                        This value should be sizeof(Bridge) -- as pointed out
                        in the answer to the question linked above

        return *t;
    }
:
}

class TypeObject
{
private:
    PyTypeObject* table;

    // these tables fit into the main table via pointers
    PySequenceMethods*       sequence_table;
    PyMappingMethods*        mapping_table;
    PyNumberMethods*         number_table;
    PyBufferProcs*           buffer_table;

public:
    PyTypeObject* type_object() const
    {
        return table;
    }

    // NOTE: if you define one sequence method you must define all of them except the assigns

    TypeObject( size_t size_bytes, const char* default_name )
        : table{ new PyTypeObject{} }  // {} sets to 0
        , sequence_table{}
        , mapping_table{}
        , number_table{}
        , buffer_table{}
    {
        PyObject* table_as_object = reinterpret_cast<PyObject* >( table );

        *table_as_object = PyObject{ _PyObject_EXTRA_INIT  1, NULL }; 
        // ^ py_object_initializer -- NULL because type must be init'd by user

        table_as_object->ob_type = _Type_Type();

        // QQQ table->ob_size = 0;
        table->tp_name              = const_cast<char *>( default_name );
        table->tp_basicsize         = size_bytes;
        table->tp_itemsize          = 0; // sizeof(void*); // so as to store extra pointer

        table->tp_dealloc           = ...

You can see it going in as table->tp_basicsize

But now it seems clear to me that PyObject-s generated from NewStyleClass_PyTypeObject will never require additional allocated memory.

Which means that this whole Bridge mechanism is unnecessary.

And PyCXX's original technique for using PyObject as a base class of NewStyleClassCXXClass, and initialising this base so that the Python runtime's PyObject for d = Derived() is in fact this base, this technique is looking good. Because it allows seamless typecasting.

Whenever Python runtime calls a slot from NewStyleClass_PyTypeObject, it will be passing a pointer to d's PyObject as the first parameter, and we can just typecast back to NewStyleClassCXXClass. <-- 'X' (referenced above)

So really my question is: why don't we just do this? Is there something special about deriving from NewStyleClass that forces extra allocation for the PyObject?

I realise I don't understand the creation sequence in the case of a derived class. Eli's post didn't cover that.

I suspect this may be connected with the fact that

    static PyObject* extension_object_new( PyTypeObject* subtype, ...

^ this variable name is 'subtype' I don't understand this, and I wonder if this may hold the key.

EDIT: I thought of one possible explanation for why PyCXX is using sizeof(FinalClass) for initialisation. It might be a relic from an idea that got tried and discarded. i.e. If Python's tp_new call allocates enough space for the FinalClass (which has the PyObject as base), maybe a new FinalClass can be generated on that exact location using 'placement new', or some cunning reinterpret_cast business. My guess is this might have been tried, found to pose some problem, worked around, and the relic got left behind.

Solution

PyCXX is not convoluted. It does have two bugs, but they can be easily fixed without requiring significant changes to the code.

When creating a C++ wrapper for the Python API, one encounters a problem. The C++ object model and the Python new-style object model are very different. One fundamental difference is that C++ has a single constructor that both creates and initializes the object. While Python has two stages; tp_new creates the object and performs minimal intialization (or just returns an existing object) and tp_init performs the rest of the initialization.

PEP 253, which you should probably read in its entirety, says:

The difference in responsibilities between the tp_new() slot and the tp_init() slot lies in the invariants they ensure. The tp_new() slot should ensure only the most essential invariants, without which the C code that implements the objects would break. The tp_init() slot should be used for overridable user-specific initializations. Take for example the dictionary type. The implementation has an internal pointer to a hash table which should never be NULL. This invariant is taken care of by the tp_new() slot for dictionaries. The dictionary tp_init() slot, on the other hand, could be used to give the dictionary an initial set of keys and values based on the arguments passed in.

...

You may wonder why the tp_new() slot shouldn't call the tp_init() slot itself. The reason is that in certain circumstances (like support for persistent objects), it is important to be able to create an object of a particular type without initializing it any further than necessary. This may conveniently be done by calling the tp_new() slot without calling tp_init(). It is also possible hat tp_init() is not called, or called more than once -- its operation should be robust even in these anomalous cases.

The entire point of a C++ wrapper is to enable you to write nice C++ code. Say for example that you want your object to have a data member that can only be initialized during its construction. If you create the object during tp_new, then you cannot reinitialize that data member during tp_init. This will probably force you to hold that data member via some kind of a smart pointer and create it during tp_new. This makes the code ugly.

The approach PyCXX takes is to separate object construction into two:

tp_new creates a dummy object with just a pointer to the C++ object which is created tp_init. This pointer is initially null.
tp_init allocates and constructs the actual C++ object, then updates the pointer in the dummy object created in tp_new to point to it. If tp_init is called more than once it raises a Python exception.

I personally think that the overhead of this approach for my own applications is too high, but it's a legitimate approach. I have my own C++ wrapper around the Python C/API that does all the initialization in tp_new, which is also flawed. There doesn't appear to be a good solution for that.