Search code examples
pythoncnumpypython-c-api

Why does PyLong_AsUnsignedLongLong function fail to convert a numpy.uint64 element, whereas PyLong_AsLongLong succeeds?


I'm working on a C-extension for Python which implements a method that converts a list of numpy elements (numpy.uint64 in this case) to the unsigned long long C-type (using the PyLong_AsUnsignedLongLong function). The elements of the list are then summed up and the resulting sum is returned to the Python layer.

To create the module I wrote this code in testmodule.c:

#include <Python.h>

static PyObject *method_sum_list_u64(PyObject *self, PyObject *args);

static PyMethodDef testmoduleMethods[] = {
    {"sum_list_u64", method_sum_list_u64, METH_VARARGS, "docs"},
    {NULL, NULL, 0, NULL}
};

static struct PyModuleDef testmodule = {
    PyModuleDef_HEAD_INIT,
    "testmodule",
    "docs",
    -1,
    testmoduleMethods
};

PyMODINIT_FUNC PyInit_testmodule(void) {
    return PyModule_Create(&testmodule);
}

And here is my method:

static PyObject *method_print_list_u64(PyObject *self, PyObject *args) {
    uint64_t sum = 0;
    PyObject *input_list;

    if (!PyArg_ParseTuple(args, "O!", &PyList_Type, &input_list))
    {
        return NULL;
    }

    Py_ssize_t data_points = PyList_Size(input_list);

    for (Py_ssize_t i = 0; i < data_points; i++)
    {
        PyObject *item = PyList_GetItem(input_list, i);
        sum += PyLong_AsUnsignedLongLong(item);
    }

    return PyLong_FromUnsignedLongLong(sum);
}

My setup.py file:

from setuptools import setup, Extension

def main():
    setup(name="testmodule",
          version="1.0.1",
          description="Python interface for the testmodule C library function",
          ext_modules=[Extension("testmodule", ["testmodule.c"])])

if __name__ == "__main__":
    main()

And a simple test script called mytest.py:

import numpy as np
import testmodule

input_list = [np.uint64(1000)]
print(testmodule.sum_list_u64(input_list))

To reproduce the error I run:

$ python setup.py install
$ python mytest.py
TypeError: an integer is required

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "mytest.py", line 5, in <module>
    print(testmodule.sum_list_u64(input_list))
SystemError: <built-in function sum_list_u64> returned a result with an error set

Now, if I replace PyLong_AsUnsignedLongLong with PyLong_AsLongLong everything works fine. Why does PyLong_AsUnsignedLongLong fail and PyLong_AsLongLong doesn't?


Solution

  • isinstance(numpy.uint64(5),int) returns False - Numpy scalar integer types are not subclasses of the Python integer type. They have different internal structures and so will not work with a C API function expecting a Python integer (notably, the Python integer type can handle an arbitrarily large integer while the Numpy types are limited to the size of the C type used internally).

    If you read the documentation you see that PyLong_AsLongLong will attempt a conversion to a Python int, and PyLong_AsUnsignedLongLong will only accept an instance of instance of PyLongObject. I don't know why that is and I don't think it would be easy to find out. However, it is the documented behaviour and it explains what you're seeing.

    A few options:

    • Convert everything to Python PyLongObject first. This method is probably the most robust (it'll accept any sensible data type you pass into it), but does involve creating intermediate Python objects (remember to DECREF them when you're done with them). Use PyObject_CallFunctionObjArgs(PyLongObject, item, NULL) (although other options may be available too). You can then use PyLong_AsUnsignedLongLong since you now definitely have the correct type.

    • If you're always passing Numpy scalar types then use the Numpy function PyArray_ScalarAsCtype to get the data from that quickly and directly. You will want some error and type checking to make sure that you have actually received a numpy.uint64.

    • Consider passing a Numpy array rather than a list of numpy.uint64. It isn't clearly what you're getting from using isolated Numpy scalar types, while an array is stored internally as C array of uint64_t that you could quickly iterate over.

    • Consider dropping the numpy.uint64s and just using plain Python ints in your list since I don't see what you gain from using the Numpy type. Then you could call PyLong_AsUnsignedLongLong.


    You're also missing all error checking in your function. Most C API calls have a return value that indicates an error (usually NULL but sometimes different). Checking for this is important. Do not skip it!