How to get format-string for data of a ctypes-pointer

Given a ctypes-pointer, for example double**:

import ctypes
data=(ctypes.POINTER(ctypes.c_double)*4)()   #  results in [NULL, NULL, NULL, NULL]

is it possible to obtain a format string, which describes the memory layout of the data?

Right now, I create a memoryview to get this information, which feels somewhat silly:

view=memoryview(data)
print(view.format)   # prints: &<d

Is there a more direct way with less overhead? Maybe through using the C-API?

One could fill data with meaningful values, if this is of any help:

import ctypes
data=(ctypes.POINTER(ctypes.c_double)*2)(
             (ctypes.c_double*2)(1.0,2.0), 
             (ctypes.c_double*1)(3.0))  

#  results in [ 
#               ptr0 -> [1,2],
#               ptr1 -> [3]
#             ]   
print(data[1][0])  #  prints 3.0

Solution

It seems as if there is nothing fundamentally better than memoryview(data).format. However, one could speed-up this a little bit by using C-API.

The format-string (which extends the struct format-string-syntax as described in PEP3118) is calculated recursively and is stored in the format-member of the StgDictObject-object, which can be found in the tp_dict-field of ctypes-arrays/pointers:

typedef struct {
    PyDictObject dict;          /* first part identical to PyDictObject */
    ...
    /* pep3118 fields, pointers neeed PyMem_Free */
    char *format;
    int ndim;
    Py_ssize_t *shape;
    ...
} StgDictObject;

This format-field is accessed only during the recursive calculation and when a buffer is exported - that is how memoryview gets this info:

static int PyCData_NewGetBuffer(PyObject *myself, Py_buffer *view, int flags)
{
    ...
    /* use default format character if not set */
    view->format = dict->format ? dict->format : "B";
    ...
    return 0;
}

Now we could use C-API to populate a buffer (without creating the actual memoryview), here implemented in Python:

%%cython

from cpython cimport buffer

def get_format_via_buffer(obj):
    cdef buffer.Py_buffer view
    buffer.PyObject_GetBuffer(obj, &view, buffer.PyBUF_FORMAT|buffer.PyBUF_ANY_CONTIGUOUS)
    cdef bytes format = view.format
    buffer.PyBuffer_Release(&view)
    return format

This version is about 3 times faster than via memoryview:

import ctypes
c=(ctypes.c_int*3)()

%timeit get_format_via_buffer(c)   #  295 ns ± 10.3 
%timeit memoryview(c).format       #  936 ns ± 7.43 ns

On my machine, there are about 160 ns overhead of calling a def-function and about 50 ms for creating a bytes-object.

Even if it doesn't make much sense to optimize it further due to the unavoidable overhead, there is still at least theoretical interest of how it could be speed up.

If one really wants to shave off also the cost of filling out the Py_buffer-struct, than there is no clean way: ctypes-module isn't part of Python-C-API (it is not in the include-directory), so the way forward is to repeat the solution Cython uses with the array.array, i.e. hardcoding the memory layout of the object (which makes this solution brittle because the memory-layout of StgDictObject can get out of sync).

Here with Cython and without error-checking:

%%cython -a  
from cpython cimport PyObject

# emulate memory-layout (i.e. copy definitions from ctypes.h)
cdef extern from *:
    """
    #include <Python.h>

    typedef struct _ffi_type
    {
      size_t size;
      unsigned short mem[2];
      struct _ffi_type **elements;
    } ffi_type;

    typedef struct {
        PyDictObject dict;          /* first part identical to PyDictObject */

        Py_ssize_t size[3];            /* number of bytes,alignment requirements,number of fields */
        ffi_type ffi_type_pointer;
        PyObject *proto;            /* Only for Pointer/ArrayObject */
        void *setfunc[3];          

        /* Following fields only used by PyCFuncPtrType_Type instances */
        PyObject *argtypes[4];       
        int flags;                  /* calling convention and such */

        /* pep3118 fields, pointers neeed PyMem_Free */
        char *format;
        int ndim;

    } StgDictObject;
    """

    ctypedef struct StgDictObject:
        char *format


def get_format_via_hack(obj):
    cdef PyObject *p =<PyObject *>obj
    cdef StgDictObject *dict = <StgDictObject *>(p.ob_type.tp_dict)
    return dict.format

And it is as fast as it gets:

%timeit get_format_via_hack(c) # 243 ns ± 14.5 ns