I have a custom python module for fuzzy string search, implementing Levenshtein distance calculation, it contains a python type, called levtree which has two members a pointer to a wlevtree C type (called tree) which does all the calculations and a PyObject* pointing to a python-list of python-strings, called wordlist. Here is what I need:
-when I create a new instance of levtree I use a constructor which takes a tuple of strings as its only input (and it is the dictionary in which the instance will perform all the searches), this constructor will have to create a new instance of wordlist into the new instance of levtree and copy the content of the input tuple into the new instance of wordlist. Here is my first code snippet and my first question:
static int
wlevtree_python_init(wlevtree_wlevtree_obj *self, PyObject *args, PyObject *kwds)
{
int numLines; /* how many lines we passed for parsing */
wchar_t** carg; /* argument to pass to the C function*/
unsigned i;
PyObject * strObj; /* one string in the list */
PyObject* intuple;
/* the O! parses for a Python object (listObj) checked
to be of type PyList_Type */
if (!(PyArg_ParseTuple(args, "O!", &PyTuple_Type, &intuple)))
{
return -1;
}
/* get the number of lines passed to us */
numLines = PyTuple_Size(intuple);
carg = malloc(sizeof(char*)*numLines);
/* should raise an error here. */
if (numLines < 0)
{
return -1; /* Not a list */
}
self->wordlist = PyList_New(numLines);
Py_IncRef(self->wordlist);
for(i=0; i<numLines; i++)
{
strObj = PyTuple_GetItem(intuple, i);
//PyList_Append(self->wordlist, string);
PyList_SetItem(self->wordlist, i, strObj);
Py_IncRef(strObj);
}
/* iterate over items of the list, grabbing strings, and parsing
for numbers */
for (i=0; i<numLines; i++)
{
/* grab the string object from the next element of the list */
strObj = PyList_GetItem(self->wordlist, i); /* Can't fail */
/* make it a string */
if(PyUnicode_Check(strObj))
{
carg[i] = PyUnicode_AsUnicode( strObj );
if(PyErr_Occurred())
{
return -1;
}
}
else
{
strObj = PyUnicode_FromEncodedObject(strObj,NULL,NULL);
if(PyErr_Occurred())
{
return -1;
}
carg[i] = PyUnicode_AsUnicode( strObj );
}
}
self->tree = (wlevtree*) malloc(sizeof(wlevtree));
wlevtree_init(self->tree,carg,numLines);
free(carg);
return 0;
}
Do I have to call Py_IncRef(self->wordlist); after self->wordlist = PyList_New(numLines); or it is redundant because references are already incremented in PyList_new? Then I have the same doubt on PyList_SetItem(self->wordlist, i, strObj); and Py_IncRef(strObj);..
-when I destroy an instance of levtree i want to call the C function that frees the space occupied by tree, destroy wordlist and decrement all reference count on all the strings contained into wordlist.. Here is my tp_dealloc:
static void
wlevtree_dealloc(wlevtree_wlevtree_obj* self)
{
//wlevtree_clear(self);
if(self->tree!=NULL)
{
wlevtree_free(self->tree);
}
free(self->tree);
PyObject *tmp, *strObj;
unsigned i;
int size = PyList_Size(self->wordlist);
for(i=0; i<size; i++)
{
strObj = PyList_GetItem(self->wordlist, i);
Py_CLEAR(strObj);
}
Py_CLEAR(self->wordlist);
Py_TYPE(self)->tp_free((PyObject *)self);
}
Is it correct to make all the deallocation work here? At the moment I don't have a tp_clear and a tp_free, do I need them? My code at the moment works on allocation but not on deallocation because even though I can call init on the same python variable more than once, at the end of every python script (which works correctly) I get a "Segmentation Fault" which makes me think that something in the deallocation process goes wrong..
tp_clear
is only needed if you implement cyclic garbage collection. It appears that this is not needed because you only maintain references to Python unicode objects.
tp_dealloc
is called when the reference count of the object goes down to zero. This is where you destroy the object and its members. It should then free the memory occupied by the object by calling tp_free
.
tp_free
is where the memory for the object is freed. Implement this only if you implement tp_alloc
yourself.
The reason for the separation between tp_dealloc
and tp_free
is that if your type is subclassed, then only the subclass knows how the memory was allocated and how to properly free the memory.
If your type is a subclass of an exisiting type, your tp_dealloc
may need to call the tp_dealloc
of the derived class, but that depends on the details of the case.
To summarize, it seems that you are handling object destruction correctly (except that you leak carg
when exiting the function with an error).