Search code examples
cserializationprotocol-buffersprotonanopb

Protocol Buffer nanopb Serializing string and decoding error utf-8 character


I am currently trying to serialize a string using nanopb and decoding the message in python/java. The ints I have no trouble and I can serialize and deserialize. But when it comes to string, I keep getting the same error: 'utf-8' codec can't decode byte 0xff in position 2: 'utf-8' codec can't decode byte 0xff in position 2: invalid start byte in field:

I thought it might be a Python decoding problem so I modified: with open('FileSerialized.bin', 'rb') as f: to

with open('FileSerialized.bin', encode='utf-8') as f:

I tried with a parser in Java an it gave the same error. Therefore I assume the problem is in the way I am encoding the message in C. I am doing the following:

After nanopb provided the conversion of the .proto:

typedef struct _ProtoExample {
    int32_t Value1;  //this is deserialized correctly
    char Value2[6]; //here is where I have trouble
}

And I tried to populate the char array by doing the following:

pb_ostream_t stream = pb_ostream_from_buffer( buffer, BUFFER_SIZE );
ProtoExample Message;
Message.Value1= S_generalConfig_s.EntityType;
Message.Value2[0] = 'a';

pb_encode( &stream, ProtoExample _fields, &Message);

Once trying to decode, I find the error when trying to read Value2.


Solution

  • ProtoExample Message;
    Message.Value1= S_generalConfig_s.EntityType;
    Message.Value2[0] = 'a';
    

    It's a good idea to initialize the message structures. Otherwise if you forget to initialize some field, it will contain random data. So change first line to:

    ProtoExample Message = ProtoExample_init_default;
    

    Here ProtoExample_init_default is a initialization macro generated by nanopb, which will contain any default values defined in .proto file. You can also use ProtoExample_init_zero to initialize to empty values instead.

    The actual issue is that your string is unterminated. In C, strings have to end in '\0' character to be valid. So you'll need to add:

    Message.Value2[1] = '\0';
    

    to set the terminator after your single character. If you add the default initialization, it will set all the bytes to zero. In that case this is kind-of redundant, but it's generally good programming practice to ensure strings are terminated.