Search code examples
pythonc++boost-python

C++ boost.python cannot convert const char* to str


I want to calculate something in C++ and return result to python. This is part of the C++ code:

const Mat& flow_map_x, flow_map_y;
std::vector<unchar> encoded_x, encoded_y;

flow_map_x = ...;
flow_map_y = ...;

Mat flow_img_x(flow_map_x.size(), CV_8UC1);
Mat flow_img_y(flow_map_y.size(), CV_8UC1);

encoded_x.resize(flow_img_x.total());
encoded_y.resize(flow_img_y.total());

memcpy(encoded_x.data(), flow_img_x.data, flow_img_x.total());
memcpy(encoded_y.data(), flow_img_y.data, flow_img_y.total());

bp::str tmp = bp::str((const char*) encoded_x.data())

The error when running python script is:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

After debugging, I found that the error comes from this line:

bp::str tmp = bp::str((const char*) encoded_x.data())

I'm not good at C++. Could anyone tell me how to fix the error? Thanks in advance!


Solution

  • You can't because encoded_x.data() is not UTF-8. You probably want bytes for a copy of the raw data:

    Using PyObject* PyBytes_FromStringAndSize(const char *v, Py_ssize_t len). Or you can use PyByteArray_FromStringAndSize for a bytearray with the same arguments.

    bp::object tmp(bp::handle<>(PyBytes_FromStringAndSize(
        // Data to make `bytes` object from
        reinterpret_cast<const char*>(encoded_x.data()),
        // Amount of data to read
        static_cast<Py_ssize_t>(encoded_x.size())
    )));
    

    In this case, you can get rid of the vector and use flow_img_x.data and flow_img_x.total() directly.


    Or a memoryview to not copy the data, but just access the std::vectors data

    Using PyObject* PyMemoryView_FromMemory(char *mem, Py_ssize_t size, int flags)

    bp::object tmp(bp::handle<>(PyMemoryView_FromMemory(
        reinterpret_cast<char*>(encoded_x.data()),
        static_cast<Py_ssize_t>(encoded_x.size()),
        PyBUF_WRITE  // Or `PyBUF_READ` i if you want a read-only view
    )));
    

    (If the vector was const, you would const_cast<char*>(reinterpret_cast<const char*>(encoded_x.data())) and only use PyBUF_READ)

    You have to make sure the vector stays alive in this case though, but it won't create an unnecessary copy.