Search code examples
c++protocol-buffersctypes

sending 0 byte values from c++ to python via protocoll buffers


I'm trying to send image data in a bytes field from a c++ library i built to a python program. The problem is that 0 byte values seem to mess up the protocol buffer parsing. When using values other than 0 everything works fine.

image_buffer.proto

syntax = "proto3";

package transport;

message Response{
    repeated bytes img_data = 1;
}

test.cpp

extern "C" void* calculate()
{
    transport::Response response;

    unsigned char* data = new unsigned char [3];


    data[0] = 11;
    data[1] = 0; // this breaks the code, if other than 0 everything works fine
    data[2] = 120;

    response.add_img_data(data, 3);

    size_t out_size = response.ByteSizeLong();
    out = malloc(out_size);
    response.SerializeToArray(out, out_size);

    return out;
}

test.py

lib = ctypes.util.find_library("myLib")
libc = ctypes.cdll.LoadLibrary(lib)

calculate = libc.calculate
calculate.restype = c_char_p

result = calculate()

response = img_buf.Response()
response.ParseFromString(result) # i get a failed parse error here when using 0 values

for idx,img in enumerate(response.img_data):
    print("num pixels:",len(img))
    for pix in img:
        print(int(pix))

I'm struggling with this for a few days now so if anyone has a hint i'd be super grateful!


Solution

  • Don't use .restype = c_char_p for binary data. c_char_p is by default handled as a null-terminated byte string and converted to a Python bytes object. Instead, use POINTER(c_char) which will return the data pointer that can then be processed correctly. You'll need to know the size of the returned buffer, however.

    Here's an example:

    test.cpp

    #ifdef _WIN32
    #   define API __declspec(dllexport)
    #else
    #   define API
    #endif
    
    extern "C" API char* calculate(size_t* pout_size)
    {
        // hard-coded data with embedded nulls for demonstration
        static char* data = "some\0test\0data";
        *pout_size = 14;
        return data;
    }
    

    test.py

    import ctypes as ct
    
    dll = ct.CDLL('./test')
    dll.calculate.argtypes = ct.POINTER(ct.c_size_t),
    dll.calculate.restype = ct.POINTER(ct.c_char)
    
    def calculate():
        out_size = ct.c_size_t()                    # ctypes storage for output parameter
        result = dll.calculate(ct.byref(out_size))  # pass by reference
        return result[:out_size.value]              # slice to correct size and return byte string
    
    result = calculate()
    print(result)
    

    Output:

    b'some\x00test\x00data'