Search code examples
pythoncpython-3.xpython-c-apipython-c-extension

python3 str object cannot pass PyUnicode_Check


I was writing a C extension function, which was supposed to accept a str object as argument. The code is shown below:

static PyObject *py_print_chars(PyObject *self, PyObject *o) {
PyObject *bytes;
char *s;
if (!PyUnicode_Check(o)) {
    PyErr_SetString(PyExc_TypeError, "Expected string");
    return NULL;
}
bytes = PyUnicode_AsUTF8String(o);
s = PyBytes_AsString(bytes);
print_chars(s);
Py_DECREF(bytes);
Py_RETURN_NONE;
}

But as I test the module in python3 console, I find str objects can't pass the PyUnicode_Check:

>>> from sample2 import *    
>>> print_chars('Hello world')    
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Expected string

As far as I know, Python 3’s str() type is called PyUnicode in C and the C code above was written in refer to "python cookbook3" Char15.13. I just can't work out the problem. Can anybody tell me what's wrong with my code.

Here is what "python cookbook3" said:

If for some reason, you are working directly with a PyObject * and can’t use PyArg_ParseTuple(), the following code samples show how you can check and extract a suitable char * reference, from both a bytes and string object:

/* Some Python Object (obtained somehow) */
PyObject *obj;

/* Conversion from bytes */
{
    char *s;
    s = PyBytes_AsString(o);
    if (!s) {
       return NULL;   /* TypeError already raised */
    }
    print_chars(s);
}
/* Conversion to UTF-8 bytes from a string */
{
    PyObject *bytes;
    char *s;
    if (!PyUnicode_Check(obj)) {
        PyErr_SetString(PyExc_TypeError, "Expected string");
        return NULL;
    }
    bytes = PyUnicode_AsUTF8String(obj);
    s = PyBytes_AsString(bytes);
    print_chars(s);
    Py_DECREF(bytes);
}

And the whole code:

#include "Python.h"
#include "sample.h"

static PyObject *py_print_chars(PyObject *self, PyObject *o) {
    PyObject *bytes;
    char *s;
    if (!PyUnicode_Check(o)) {
        PyErr_SetString(PyExc_TypeError, "Expected string");
        return NULL;
    }
    bytes = PyUnicode_AsUTF8String(o);
    s = PyBytes_AsString(bytes);
    print_chars(s);
    Py_DECREF(bytes);
    Py_RETURN_NONE;
}

/* Module method table */
static PyMethodDef SampleMethods[] = {
    {"print_chars", py_print_chars, METH_VARARGS, "print character"},
    { NULL, NULL, 0, NULL}
};

/* Module structure */
static struct PyModuleDef samplemodule = {
PyModuleDef_HEAD_INIT,
    "sample",
    "A sample module",
    -1,
    SampleMethods
};

/* Module initialization function */
PyMODINIT_FUNC
PyInit_sample2(void) {
    return PyModule_Create(&samplemodule);
}

Solution

  • If the goal is to accept exactly one argument, the function should be declared as METH_O, not METH_VARARGS; the former passes along the single argument without wrapping, the latter wraps in a tuple which would need to be unpacked or parsed to get the PyUnicode* inside.