Search code examples
pythoncobjectbinary-treepython-c-api

Binary Tree implemented in C using Python Api converts PyObject into Bytes


My PyObject becomes a Bytes object while getting it's value

So lately, I am doing a project in C, where I implement a few type of trees, to be able to use them in python, since a C btree implementation is way faster than a Python one. (Ikr there are a few available libraries, but since I have more free time during the lockdown, I thought, I could do my own library.)

Everything is working fine, untill I would like to find an element, and print its value, all at the same line. When I do that, my Node object turns into a Bytes object, without any further assignment.

Further Information

OS: Ubuntu 16.04.
Python version: Python 3.5.2
GCC: 5.4.0-6 Ubuntu

Python Code:

import mylib
import random

maxv = 10

def addValues(tree, values):
    for value in values:
        tree.insert(value)

def main():
    my_list = [i for i in range(maxv)]
    #random.shuffle(my_list)
    tree    = mylib.BinaryTree('test')
    addValues(tree, my_list)
    print(tree.find(3).getValue())
    print(tree.sort(False))
    
main()

Expected Output (Works if line before the last in main func is print(tree.find(3))):

3
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Output I got with the above test code:

3
Segmentation fault

The Segmentation fault occurs, because the node contains the value 3 became a Bytes object while printing its value. The below code will print the new Node.

import mylib
import random

maxv = 10

def addValues(tree, values):
    for value in values:
        tree.insert(value)

def main():
    my_list = [i for i in range(maxv)]
    #random.shuffle(my_list)
    tree    = mylib.BinaryTree('test')
    addValues(tree, my_list)
    print(tree.find(0).getValue()) #0 is going to be the root node's value, since random.shuffle is not happening.
    print(tree.root)
    print(tree.sort(False))
    
main()

Output:

0
b'0'
Segmentation fault

I spent days debugging this, and since I am definetly not a master in C programming, (You will see below), I could not find the error. I would like to move on, and implement more features, but I was unable to find the error. There is a chance I missed something so trivial, or a thing I am not aware of. I am not an experienced C programmer sadly. :/

My module contains much more code, I will not post what is not necessary for this problem. If you think you should see more to understand my code, feel free to tell me! Also I hope my code is well-readable!

Can someone explain what is happening exactly?
Thank you!

C code related here:

Recursive find function:

/** find node by value */
/*static*/ PyObject *find(Node *root, int value) {
    if((PyObject *) root == Py_None) {
        return NULL;
    }
    if(root->value == value) {
        return (PyObject *) root;
    }
    if(value < root->value) {
        return find((Node *) root->left,  value);
    }
    if(value > root->value) {
        return find((Node *) root->right, value);
    }
}

Controller, this method is being called from Python, and this will call the above recursive function:

/** Find node by value */
/*static*/ PyObject *BinaryTree_Find(BinaryTree *self, PyObject *args, PyObject *kwds) {
    int value;
    PyObject *result;
    
    if(!PyArg_ParseTuple(args, "i", &value)) {
        return NULL;
    }
    
    result = find((Node *) self->root, value);
    if(result == NULL) {
        result = Py_None;
    }
    
    return (PyObject *) result;
}

Minimal, Reproducible Example

You can use the Python code above to drive this module, and the below code to create the library.

from distutils.core import setup, Extension
setup(
    name = 'mylib', version = '1.0',  \
    ext_modules = [Extension('mylib',
        [
            'custom.c',
        ])])

And run by: sudo python3 build.py install

#include <Python.h>
#include "structmember.h"

/* Header Block */
#define MODULE_NAME "mylib"
#define MODULE_DOC  "Custom Binary Tree Implementations in C for Python"

/** Node */
typedef struct {
    PyObject_HEAD
    int value;
    PyObject *left;
    PyObject *right;
} Node;

/** BinaryTree */
typedef struct {
    PyObject_HEAD
    int elem;
    PyObject *name;
    PyObject *root;
} BinaryTree;

/** NODE */

/** creating node, set inital values */
/*static*/ PyObject *newNode(PyTypeObject *type, PyObject *args, PyObject *kwds) {
    Node *self;
    
    self = (Node *) type->tp_alloc(type, 0);
    if(self == NULL) {
        return NULL;
    }
    
    self->value = 0;
    self->left  = NULL;
    self->right = NULL;
    
    return (PyObject *) self;
}

/** init function of node, set parameter as value */
/*static*/ int Node_Init(Node *self, PyObject *args, PyObject *kwds) {
    int value;
    
    if(!PyArg_ParseTuple(args, "i", &value)) {
        return -1;
    }
    
    self->value = value;
    self->left  = Py_None;
    self->right = Py_None;
    
    return 0;
}

/** deallocation of node */
/*static*/ void Node_Dealloc(Node *self) {
    if(self->left  != Py_None) Py_DECREF(self->left);
    if(self->right != Py_None) Py_DECREF(self->right);
    Py_TYPE(self)->tp_free((PyObject *) self);
}

/** return value of node */
/*static*/ PyObject *Node_GetValue(Node *self, PyObject *args, PyObject *kwds) {
    return Py_BuildValue("i", self->value);
}

/** node variables */
/*static*/ PyMemberDef Node_Members[] = {
    {
        "left",
        T_OBJECT_EX,
        offsetof(Node, left),
        0,
        "Left Child"
    },
    {
        "right",
        T_OBJECT_EX,
        offsetof(Node, right),
        0,
        "Right Child"
    }
};

/** node methods */
/*static*/ PyMethodDef Node_Methods[] = {
    {
        "getValue",
        (PyCFunction) Node_GetValue,
        METH_NOARGS,
        "Get value of node"
    },
    {NULL}
};

/** node type def */
/*static*/ PyTypeObject Node_Type = {
    PyVarObject_HEAD_INIT(NULL, 0)
    .tp_name      = "BinaryTree Node",
    .tp_doc       = "Custom Binary Tree Node Object",
    .tp_basicsize = sizeof(Node),
    .tp_itemsize  = 0,
    .tp_flags     = Py_TPFLAGS_DEFAULT,
    .tp_new       = newNode,
    .tp_init      = (initproc) Node_Init,
    .tp_dealloc   = (destructor) Node_Dealloc,
    .tp_members   = Node_Members,
    .tp_methods   = Node_Methods,
};

/** CONTROLLER */

/** insert new node into the tree */
/*static*/ PyObject *insert(Node *root, int value, PyObject *args) {
    if((PyObject *) root == Py_None) {
        return PyObject_CallObject((PyObject *) &Node_Type, args);
    }
    if(root->value == value) {
        return NULL;
    }
    if(value < root->value) {
        root->left  = insert((Node *) root->left, value, args);
    }
    if(value > root->value) {
        root->right = insert((Node *) root->right, value, args);
    }
    return (PyObject *) root;
}

/** find node by value */
/*static*/ PyObject *find(Node *root, int value) {
    if((PyObject *) root == Py_None) {
        return NULL;
    }
    if(root->value == value) {
        return (PyObject *) root;
    }
    if(value < root->value) {
        return find((Node *) root->left,  value);
    }
    if(value > root->value) {
        return find((Node *) root->right, value);
    }
}

/** sort tree asc / inorder traversal */
/*static*/ void sortAsc(Node *root, PyObject *list) {
    if((PyObject *) root == Py_None) {
        return;
    }
    sortAsc((Node *) root->left,  list);
    PyList_Append(list, Py_BuildValue("i", root->value));
    sortAsc((Node *) root->right, list);
}

/** sort tree desc */
/*static*/ void sortDesc(Node *root, PyObject *list) {
    if((PyObject *) root == Py_None) {
        return;
    }
    sortDesc((Node *) root->right,  list);
    PyList_Append(list, Py_BuildValue("i", root->value));
    sortDesc((Node *) root->left, list);
}

/** BinaryTree */

/** creating binary tree, set inital values */
/*static*/ PyObject *newBinaryTree(PyTypeObject *type, PyObject *args, PyObject *kwds) {
    BinaryTree *self;
    
    self = (BinaryTree *) type->tp_alloc(type, 0);
    if(self == NULL) {
        return NULL;
    }
    
    self->name = PyUnicode_FromString("");
    if(self->name == NULL) {
        Py_DECREF(self);
        return NULL;
    }
    
    self->elem = 0;
    self->root = Py_None;
    
    return (PyObject *) self;
}

/** set parameters as variables */
/*static*/ int BinaryTree_Init(BinaryTree *self, PyObject *args, PyObject *kwds) {
    PyObject *name, *temp;
    
    if(!PyArg_ParseTuple(args, "O", &name)) {
        return -1;
    }
    
    if(name) {
        temp = self->name;
        Py_INCREF(name);
        self->name = name;
        Py_XDECREF(temp);
    }
    
    self->elem = 0;
    
    return 0;
}

/** deallocation of binary tree */
/*static*/ void BinaryTree_Dealloc(BinaryTree *self) {
    Py_XDECREF(self->name);
    Py_XDECREF(self->root);
    Py_TYPE(self)->tp_free((PyObject *) self);
}

/** insert controller, set parameter as value, drive inserting, return true */
/*static*/ PyObject *BinaryTree_Insert(BinaryTree *self, PyObject *args, PyObject *kwds) {
    int value;
    PyObject *result;
    
    if(!PyArg_ParseTuple(args, "i", &value)) {
        return NULL;
    }
    
    result = insert((Node *) self->root, value, args);
    if(result == NULL) {
        Py_XDECREF(result);
        PyErr_SetString(PyExc_TypeError, "Already existing value!");
        return NULL;
    }
    
    self->root = result;
    self->elem++;
    
    Py_RETURN_TRUE;
}

/** Find node by value */
/*static*/ PyObject *BinaryTree_Find(BinaryTree *self, PyObject *args, PyObject *kwds) {
    int value;
    PyObject *result;
    
    if(!PyArg_ParseTuple(args, "i", &value)) {
        return NULL;
    }
    
    result = find((Node *) self->root, value);
    if(result == NULL) {
        result = Py_None;
    }
    
    return (PyObject *) result;
}

/** sort binary tree asc */
/*static*/ PyObject *BinaryTree_Sort(BinaryTree *self, PyObject *args, PyObject *kwds) {
    int rev;
    PyObject *list = PyList_New(0);
    
    if(!PyArg_ParseTuple(args, "p", &rev)) {
        Py_XDECREF(list);
        return NULL;
    }
    
    switch(rev) {
        case 0:
            sortAsc( (Node *) self->root, list);
            break;
        case 1:
            sortDesc((Node *) self->root, list);
            break;
        default:
            sortAsc( (Node *) self->root, list);
    }
    
    return list;
}

/** binary tree variables */
/*static*/ PyMemberDef BinaryTree_Members[] = {
    {
        "name",
        T_OBJECT_EX,
        offsetof(BinaryTree, name),
        0,
        "BinaryTree's unique name"
    },
    {
        "root",
        T_OBJECT_EX,
        offsetof(BinaryTree, root),
        0,
        "BinaryTree's root"
    },
    {NULL}
};

/** binary tree methods */
/*static*/ PyMethodDef BinaryTree_Methods[] = {
    {
        "insert",
        (PyCFunction) BinaryTree_Insert,
        METH_VARARGS,
        "Insert new node."
    },
    {
        "find",
        (PyCFunction) BinaryTree_Find,
        METH_VARARGS,
        "Find node by value."
    },
    {
        "sort",
        (PyCFunction) BinaryTree_Sort,
        METH_VARARGS,
        "Sort tree."
    },
    {NULL}
};

/** binary tree type def */
/*static*/ PyTypeObject BinaryTree_Type = {
    PyVarObject_HEAD_INIT(NULL, 0)
    .tp_name      = "BinaryTree",
    .tp_doc       = "Custom Binary Tree Object",
    .tp_basicsize = sizeof(BinaryTree),
    .tp_itemsize  = 0,
    .tp_flags     = Py_TPFLAGS_DEFAULT,
    .tp_new       = newBinaryTree,
    .tp_init      = (initproc) BinaryTree_Init,
    .tp_dealloc   = (destructor) BinaryTree_Dealloc,
    .tp_members   = BinaryTree_Members,
    .tp_methods   = BinaryTree_Methods,
};

/** MODULE */

/** Module def */
static struct PyModuleDef mylib_module = {
    PyModuleDef_HEAD_INIT,
    MODULE_NAME, /* name of module */
    MODULE_DOC,  /* module documentation, may be NULL */
    -1,          /* size of per-interpreter state of the module */
};

/** Module init */
PyMODINIT_FUNC PyInit_mylib(void) {
    PyObject *mod;
    
    if(PyType_Ready(&BinaryTree_Type) < 0) {
        return NULL;
    }
    
    if(PyType_Ready(&Node_Type) < 0) {
        return NULL;
    }
    
    mod = PyModule_Create(&mylib_module);
    if(mod == NULL) {
        return NULL;
    }
    
    Py_INCREF(&BinaryTree_Type);
    if(PyModule_AddObject(mod, "BinaryTree", (PyObject *) &BinaryTree_Type) < 0) {
        Py_DECREF(&BinaryTree_Type);
        Py_DECREF(mod);
        return NULL;
    }
    
    Py_INCREF(&Node_Type);
    if(PyModule_AddObject(mod, "Node", (PyObject *) &Node_Type) < 0) {
        Py_DECREF(&Node_Type);
        Py_DECREF(mod);
        return NULL;
    }
    
    return mod;
}

EDIT

print(tree.root.getValue())
print(tree.find(tree.root.value).getValue())

Works fine! The error lies in the find function!

Thanks to DavidW, he pointed out it must be a reference error somewhere, so I counted, how many references are there refering to that Node object before, and after printing it's value. The results:

2                  # references before
56                 # value before, this line accessed the Node.
b'56\n'            # value after 
139690028005977    # references after
Segmentation fault # segfault while sorting

So while using Find, the Node's reference count gets decreased, therefore all Node objects are deallocated/deleted. Increasing before returning, and custom setter/getter methods with INCREF/DECREF-s are also ineffective.

DONE

Following DavidW's suggestion, I managed to find the answer! I did not increased the reference count, therefore the Nodes where deallocated right after searching. The objects were not behaving the same way when I did not requested the Node's value, or when I did not print the value out.I suggest the print function decreased the reference count somehow.

Right find controller:

/** Find nood by value */
/*static*/ PyObject *BinaryTree_Find(BinaryTree *self, PyObject *args, PyObject *kwds) {
    int value;
    PyObject *result;
    
    if(!PyArg_ParseTuple(args, "i", &value)) {
        return NULL;
    }
    
    result = find((Node *) self->root, value);
    if(result == NULL) {
        result = Py_None;
    }
    
    Py_INCREF(result); // Increasing the Node object's reference count here
    
    return (PyObject *) result;
}

Solution

  • As discussed in the comments it was a reference counting error. A C API function must return what Python calls a "new reference". That means either returning something that has been purposefully created inside the function (for example the result of PyList_New) in incrementing the reference count of an existing object.

    In BinaryTree_Find specifically you were not returning a new reference. As a result Python would end up freeing an object that still formed part of your BinaryTree. Once that's happened you can get all kinds of strange and confusing behaviour. I suggested adding Py_INCREF(result).

    To help diagnose this kind of problem it was be worth adding printf statements to object constructors and destructors (just as a temporary debugging measure) so you can check your assumptions about when they are allocated and freed.