Search code examples
pythonc++protocol-buffers

Generate a protobuf C++ serialized string by SerializeToArray and parsing in python by ParseFromString


  • C++
  1. clang++ 14.0.6
  2. protobuf v21.7

C++ serialization:

*response = new char[report.ByteSizeLong()];
report.SerializeToArray(*response,report.ByteSizeLong());
*response_length = report.ByteSizeLong();
  • Python
  1. python 3.8.10
  2. protoc v26.1 (generate .py and .pyi file)
  3. protobuf==5.26.1

Python de-serialization:

from ctypes import cdll, c_char_p, c_int, byref

__so = cdll.LoadLibrary(self.__file)
__report_in_pb = c_char_p()
__report_ob_size = c_int()
__so.handle_message_v1(__data, len(__data),byref(__report_in_pb), byref(__report_ob_size))

if __report_ob_size.value > 0:
    try:
        __report: RulesReport = RulesReport()
        __report.ParseFromString(__report_in_pb.value)
        print(__report)
    except Exception as e:
        print(f'Could not analyse the dataview data with timestamp {__timestamp}: {e}')
  • Proto

My .proto file is like:

message ReviewResult {
  message ZeroSumItem {
      bool is_passed = 1;
      string message = 2;
  }
  message AccumulatorItem {
      double total = 1;
      uint64 num_measurements = 2;
  }
  uint64 timestamp = 1;
  uint32 sequence_number = 2;
  reserved 3;
  oneof metric_data {
      ZeroSumItem boolean_indicator = 4;
      AccumulatorItem accumulator_indicator = 5;
      double statistics_indicator = 6;
  }
}

/// message for report rules situation
message RulesReport {
  reserved 1 to 2;
  repeated ReviewResult review_results = 3;
}

When I try to set a data converted from int to double into the message like:

int num_measurements_ = 3;
double num_measurements_double = static_cast<double>(num_measurements_);
review_result->mutable_accumulator_indicator()->set_total(num_measurements_double);
review_result->mutable_accumulator_indicator()->set_num_measurements(num_measurements_);

The error log occurs:

// from `print(f'Could not analyse the dataview data with timestamp {__timestamp}: {e}')`
Could not analyse the dataview data with timestamp 1717840976634: Error parsing message

Why?

When I used a native double value:

int num_measurements_ = 3;
double num_measurements_double = 1.1;
review_result->mutable_accumulator_indicator()->set_total(num_measurements_double);
review_result->mutable_accumulator_indicator()->set_num_measurements(num_measurements_);

It would work (without error log):

// from `print(__report)`
review_results {
  timestamp: 1717840968128529992
  sequence_number: 742681
  accumulator_indicator {
    total: 1.1
    num_measurements: 3
  }
}

And I tried downgrading the protoc to v21.7 to generate the new .py and .pyi file, but it didn't work.


Solution

  • The documentation of c_char_p makes an important distinction between NUL-terminated strings and pointers to arbitrary bytes:

    Represents the C char * datatype when it points to a zero-terminated string. For a general character pointer that may also point to binary data, POINTER(c_char) must be used.

    What you can do instead is:

    __report_in_pb = POINTER(c_char)
    __report_ob_size = c_int()
    __so.handle_message_v1(__data, len(__data),byref(__report_in_pb), byref(__report_ob_size))
    
    message: bytes = ctypes.string_at(__report_in_pb, __report_ob_size.value)
    __report: RulesReport = RulesReport()
    __report.ParseFromString(message)
    

    Which will work properly because string_at constructs a bytes object from the given pointer and length.