Search code examples
pythonpython-c-api

Updating elements of an array using the Python3/C API


I have a module method which takes in a python list, and then outputs the same list with all items multiplied by 100.

I've attemped to follow the C intro here as close as possible but still running into issues.

static PyObject *
test_update_list(PyObject *self, PyObject *args)
{
    PyObject *listObj = NULL;
    PyObject *item = NULL;
    PyObject *mult = PyLong_FromLong(100);
    PyObject *incremented_item = NULL;

    if (!PyArg_ParseTuple(args, "O", &listObj))
    {
        return NULL;
    }

    /* get the number of lines passed to us */
    Py_ssize_t numLines = PyList_Size(listObj);

    /* should raise an error here. */
    if (numLines < 0) return NULL; /* Not a list */

    for (Py_ssize_t i=0; i<numLines; i++) {
        // pick the item 
        item = PyList_GetItem(listObj, i);

        if (mult == NULL)
            goto error;

        // increment it
        incremented_item = PyNumber_Add(item, mult);

        if (incremented_item == NULL)
            goto error;

        // update the list item
        if (PyObject_SetItem(listObj, i, incremented_item) < 0)
            goto error;

    }
error:
    Py_XDECREF(item);
    Py_XDECREF(mult);
    Py_XDECREF(incremented_item);
    return listObj;
};

The above complies fine, however when I run in ipython, I get the below error.

If I take away the error handling I get a seg fault.

---------------------------------------------------------------------------
SystemError                               Traceback (most recent call last)
SystemError: null argument to internal routine

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
<ipython-input-3-da275aa3369f> in <module>()
----> 1 testadd.test_update_list([1,2,3])

SystemError: <built-in function ulist> returned a result with an error set

Any help is appreciated.


Solution

  • So you have a number of issues that all need to be corrected. I've listed them all under separate headings so you can go through them one at a time.

    Always returning listObj

    When you get an error in your for loop, you would goto the error label, which was still returning the list. By returning this list you hide that there was an error in your function. You must always return NULL when you expect your function to raise an exception.

    Does not increment listObj ref count on return

    When your function is invoked you are given a borrowed reference to your arguments. When you return one of those arguments you are creating a new reference to your list, and so must increment its reference count. Otherwise the interpreter will have a reference count that is one lower than the number of actual references to the object. This will end up with a bug where the interpreter deallocates your list when there is only 1 reference rather than 0! This could result in a seg fault, or it could in the worst case scenario result in random parts of the program access the that has since been deallocated and allocated for some other object.

    Uses PyObject_SetItem with primitive

    PyObject_SetItem can be used with dicts and other class that implements obj[key] = val. So you cannot supply it with an argument of type Py_ssize_t. Instead, use PyList_SetItem which only accepts Py_ssize_t as its index argument.

    Bad memory handling of item and incremented_item

    PyObject_SetItem and PyList_SetItem both handle decreasing the reference count of the object that was already at the position that was being set. So we don't need to worry about managing the reference count of item as we are only working with a reference borrowed from the list. These pair of functions also steal a reference to incremented_item, and so we don't need to worry about managing its reference count either.

    Memory leak on incorrect arguments

    For example, when you call your function with an int. You will create a new reference to the 100 int object, but because you return NULL rather than goto error, this reference will be lost. As such you need to handle such scenarios differently. In my solution, I move the PyLong_FromLong call to after the arg and type checking. In this way we are only create this new* object once we are guaranteed it will be used.

    Working code

    Side note: I removed the goto statements as there was only one left, and so it made more sense to do the error handling at that point rather than later.

    static PyObject *
    testadd_update_list(PyObject *self, PyObject *args)
    {
        PyObject *listObj = NULL;
        PyObject *item = NULL;
        PyObject *mult = NULL;
        PyObject *incremented_item = NULL;
        Py_ssize_t numLines;
    
        if (!PyArg_ParseTuple(args, "O:update_list", &listObj))
        {
            return NULL;
        }
        if (!PyList_Check(listObj)) {
            PyErr_BadArgument();
            return NULL;
        }
    
        /* get the number of lines passed to us */
        // Don't want to rely on the error checking of this function as it gives a weird stack trace. 
        // Instead, we use Py_ListCheck() and PyErr_BadArgument() as above. Since list is definitely 
        // a list now, then PyList_Size will never throw an error, and so we could use 
        // PyList_GET_SIZE(listObj) instead.
        numLines = PyList_Size(listObj);
    
        // only initialise mult here, otherwise the above returns would create a memory leak
        mult = PyLong_FromLong(100);
        if (mult == NULL) {
            return NULL;
        }
    
        for (Py_ssize_t i=0; i<numLines; i++) {
            // pick the item 
            // It is possible for this line to raise an error, but our invariants should
            // ensure no error is ever raised. `list` is always of type list and `i` is always 
            // in bounds.
            item = PyList_GetItem(listObj, i);
    
            // increment it, and check for type errors or memory errors
            incremented_item = PyNumber_Add(item, mult);
            if (incremented_item == NULL) {
                // ERROR!
                Py_DECREF(mult);
                return NULL;
            }
    
            // update the list item
            // We definitely have a list, and our index is in bounds, so we should never see an error 
            // here.
            PyList_SetItem(listObj, i, incremented_item);
            // PyList_SetItem steals our reference to incremented_item, and so we must be careful in 
            // how we handle incremented_item now. Either incremented_item will not be our 
            // responsibility any more or it is NULL. As such, we can just remove our Py_XDECREF call
        }
    
        // success!
        // We are returning a *new reference* to listObj. We must increment its ref count as a result!
        Py_INCREF(listObj);
        Py_DECREF(mult);
        return listObj;
    }
    

    Footnote:

    * PyLong_FromLong(100) doesn't actually create a new object, but rather returns a new reference to an existing object. Integers with low values (0 <= i < 128 I think) are all cached and this same object is returned when needed. This is an implementation detail that is meant to avoid high levels of allocating and deallocating integers for small values, and so improve the performance of Python.