I'm toying around with Python C extension modules and I've got this simple function:
static PyObject *e_parse_metadata(PyObject *self, PyObject *args) {
Py_buffer buf;
if(!PyArg_ParseTuple(args, "y#", &buf)) {
// interpreter raises exception, we return NULL to indicate failure
return NULL;
}
fprintf(stdout, "extension: %c%c\n\n", *((char *) buf.buf) + 0, *((char*) buf.buf + 1)); // should print "BM"
PyBuffer_Release(&buf);
return PyLong_FromLong(33l);
}
It attempts to obtain a Py_buffer
from an argument passed to it from within Python. It then displays the first 2 bytes from the buffer as characters to stdout
, releases the buffer, and returns a reference to a new PyObject
representing the integer 33
.
Next I've got this Python example utilizing said function:
#!/usr/bin/env python3
import bbmp_utils # my module
with open('./mit.bmp', 'rb') as mit:
if(mit.readable()):
filedata = mit.read()
res = bbmp_utils.parse_metadata(filedata) # call to my function in the extension module
print(res, type(res))
This results in the extension module successfully printing the first 2 bytes from the byte stream (extension: BM
) to stdout, but it then terminates: fish: “env PYTHONPATH=./build_dbg pyth…” terminated by signal SIGSEGV (Address boundary error)
Strangely enough directly passing the bytes
instance to my extension function doesn't cause a crash at all, e.g.
res = bbmp_utils.parse_metadata(mit.read())
Why does the first example result in a crash and the second one doesn't?
I was using the wrong format specifier when parsing Python arguments.
y#
requires that the length of the buffer be passed to PyArg_ParseTuple
as well, which I hadn't done. Also note that the #
variant assumes a read-only buffer.
y*
works as expected.
This is fine but it still doesn't explain why one of the python versions crashes and the other doesn't.