I'm reverse-engineering custom binary file format. I found data structure to represent UTF-8 strings. There is one or two bytes in header to save string length, then the actual string data.
╔════════════╦═══════════════╦═══════════════════╗
║ first byte ║ optional byte ║ UTF-8 string data ║
╚════════════╩═══════════════╩═══════════════════╝
The second byte in header is optional and it's only there when string length is greater than 128 bytes. When string length is less or equal 128 bytes, decoding it's length is easy. However, when string length > 128, I'm failing to calculate string length. So I did experiments and generated many binary files with different string length and below is the result. String length is in bytes.
╔════╦════╦═══════════════╗
║ 01 ║ 02 ║ String length ║
╠════╬════╬═══════════════╣
║ 7D ║N/A ║ 126 ║
║ 7E ║N/A ║ 127 ║
║ 7F ║N/A ║ 128 ║
║ 80 ║ 01 ║ 129 ║
║ 81 ║ 01 ║ 130 ║
║ C7 ║ 01 ║ 200 ║
║ C8 ║ 01 ║ 201 ║
║ F9 ║ 01 ║ 250 ║
║ FE ║ 01 ║ 255 ║
║ FF ║ 01 ║ 256 ║
║ 80 ║ 02 ║ 257 ║
║ 81 ║ 02 ║ 258 ║
║ 82 ║ 02 ║ 259 ║
║ F3 ║ 03 ║ 500 ║
║ F4 ║ 03 ║ 501 ║
║ F5 ║ 03 ║ 502 ║
║ F6 ║ 03 ║ 503 ║
║ 80 ║ 04 ║ 513 ║
╚════╩════╩═══════════════╝
I read somewhere that pascal\delphi is using string format where it has header to save string length instead of null terminated strings like in C, which look similar to my case. My question is, do you have idea about this format? How can I calculate string length when it's greater than 128 bytes in length.
You can calculate is as (FirstByte and 0x7F) + 0x80 * SecondByte + 1