Search code examples
pythonpython-c-api

Deriving from arbitrary Python class with C API and `tp_basicsize` of `object`?


I am trying to define a function, that would create a Python class using C API, that derives from arbitrary Python type base, and has an extra field void* my_ptr in its raw C-like object layout. I want it to reuse Python's __dict__ functionality.

I am not doing it in C, so don't have access to C macros. My initial attempt looks like this (pseudocode):

PyType Derive(PyType base) {
  var newType = new PyType(...);

  newType.tp_flags = HeapType | BaseType; // <- this is important,
    // one should be able to add new attributes and otherwise use __dict__, subclass, etc

  ... filling in other things ...

  int my_ptr_offset = base.tp_basesize; // put my_ptr immediately after base type data
  newType.tp_basesize = my_ptr_offset + sizeof(void*); // instances of new type
    // will have instance size = base size + size of my_ptr

  ...

  return newType;
}

The problem is this code breaks down when base is builtins.object. In that case tp_basesize does not count the field, that would normally store __dict__, and my_ptr_offset ends up pointing to that field, eventually causing it to be overwritten by the consumer of my_ptr.

Any simple Python class that derives from object does not have that problem. E.g.:

class MySimpleClass: pass

On 64-bit machine:

PyType mySimpleClass = ...;
PyType object = ...;
mySimpleClass.tp_basesize // <- 32, includes __dict__
object.tp_basesize // <- 16, does not include space for __dict__

I also noticed a similar problem with builtins.exception.

Right now I just manually check for exception and object and add 2x sizeof(void*) to tp_basesize, which seems to work. But I'd like to understand how to handle that layout properly.


Solution

  • I think the information you want is in tp_dictoffset of the base. If this is set to 0 then the base doesn't have a __dict__, anything else and it does.

    I'm a little unclear on how you're creating your types, but at-least through a call to PyType_Type (the method used internally when writing class X: in Python) a dict is added unless __slots__ is defined - it sounds like this is both what you want to happen and what is happening. This is detailed under "Inheritance" in the section of documentation I linked.

    Therefore, if tp_dictoffset == 0 (and assuming you aren't defining __slots__) then add sizeof(PyObject*) to account for the dictionary that's implicitly added.