Search code examples
pythoncpython-3.xpython-c-apipython-internals

How Does String Conversion Between PyUnicode String and C String Work?


I have a PyUnicode object I'm trying to convert back to a C string (char *).

The way I am trying to do it does not seem to be working. Here is my code:

PyObject * objectCompName = PyTuple_GET_ITEM(compTuple, (Py_ssize_t) 0);
PyObject * ooCompName = PyUnicode_AsASCIIString(objectCompName);
char * compName = PyBytes_AsString(ooCompName);
Py_DECREF(ooCompName);

Is there another/better way I should be doing this?


Solution

  • If UTF-8 encoded char * is OK, you should definitely use PyUnicode_AsUTF8AndSize (which requires Python 3.3):

    PyObject * objectCompName = PySequence_GetItem(compTuple, 0);
    if (! objectCompName) {
        return NULL;
    }
    
    Py_ssize_t size;
    char *ptr = PyUnicode_AsUTF8AndSize(objectCompName, &size);
    if (!ptr) {
        return NULL;
    }
    
    // notice that the string pointed to by ptr is not guaranteed to stay forever,
    // and you need to copy it, perhaps by `strdup`.
    

    Also, do understand that is mandatory to check the return value of each and every Py* function call that you ever execute in your code.

    Here the PyTuple_GetItem will return NULL if compTuple is not a tuple, or 0 causes IndexError. PyUnicode_AsUTF8AndSize will return NULL if objectCompName is not a str object. Ignore the return value and CPython crashes with SIGSEGV when the conditions are right.