Search code examples
pythonc++c++11unicodepython-c-api

Unicode-friendly architecture for bridging Python's String and Bytes types to C++


I'm writing a C++ Python wrapper.

I am planning to have a generic Object class,

class Object {
private:
    PyObject* p;
public:
    Object(int i)    { /* construct PyLong */ }
    Object(double i) { /* construct PyFloat */ }
    :
    etc

i.e. The consumer can do Object{42} or Object{3.14} and the Python runtime will construct a PyObject of matching type. Object will then store the pointer in p.

Now I'm attempting to handle string types. I notice Python has a PyBytes and a PyString primitive, and I'm considering:

    Object(const char* cstr)     { /* construct PyBytes */ }
    Object(const std::string& s) { /* construct PyString */ }

But I think the issue may be compounded with Unicode.

I could for example provide an additional constructor allowing construction of a Unicode PyString:

    Object( const std::string& s, const char* enc, const char* err=nullptr )
        : Object{ PyUnicode_Decode( s.c_str(), s.size(), enc, err ) } 
    { }

But is there anything smarter I can do? Can I examine std::string for its encoding and take care of directly invoking PyUnicode_Decode?

I am unfamiliar with Unicode handling in both C++ and Python, so I'm asking in advance for guidance.

EDIT: Reading up on C++ handling of Unicode, it appears that different operating systems may favour string vs wstring (std::wstring VS std::string). Hence it is probably relevant to point out that I'm attempting a multi-platform (Windows, Linux, OS X, Android, iOS) solution.


Solution

  • PyString/PyUnicode in Python 2 is equivalent to PyBytes/PyUnicode in Python 3. In Python 3 there is a compatiblity header mapping PyString to PyBytes.

    So depending on your target Python version, use PyString/PyUnicode or PyBytes/PyUnicode, but don't mix PyString and PyBytes. Map std::string/char* to PyBytes or PyString, and std::wstring/wchar_t* to PyUnicode.