Getting a SIGSEGV when calling python3 extension module function operating a Py_buffer

I'm toying around with Python C extension modules and I've got this simple function:

static PyObject *e_parse_metadata(PyObject *self, PyObject *args) {
    Py_buffer buf;

    if(!PyArg_ParseTuple(args, "y#", &buf)) {
        // interpreter raises exception, we return NULL to indicate failure
        return NULL;
    }

    fprintf(stdout, "extension: %c%c\n\n", *((char *) buf.buf) + 0, *((char*) buf.buf + 1)); // should print "BM"

    PyBuffer_Release(&buf);

    return PyLong_FromLong(33l);
}

It attempts to obtain a Py_buffer from an argument passed to it from within Python. It then displays the first 2 bytes from the buffer as characters to stdout, releases the buffer, and returns a reference to a new PyObject representing the integer 33.

Next I've got this Python example utilizing said function:

#!/usr/bin/env python3

import bbmp_utils # my module

with open('./mit.bmp', 'rb') as mit:
    if(mit.readable()):
        filedata = mit.read()
        res = bbmp_utils.parse_metadata(filedata) # call to my function in the extension module
        print(res, type(res))

This results in the extension module successfully printing the first 2 bytes from the byte stream (extension: BM) to stdout, but it then terminates: fish: “env PYTHONPATH=./build_dbg pyth…” terminated by signal SIGSEGV (Address boundary error)

Strangely enough directly passing the bytes instance to my extension function doesn't cause a crash at all, e.g.

res = bbmp_utils.parse_metadata(mit.read())

Why does the first example result in a crash and the second one doesn't?

Solution

I was using the wrong format specifier when parsing Python arguments.

y# requires that the length of the buffer be passed to PyArg_ParseTuple as well, which I hadn't done. Also note that the # variant assumes a read-only buffer.

y* works as expected.

This is fine but it still doesn't explain why one of the python versions crashes and the other doesn't.