I'm working on writing a pure JS thrift decoder that doesn't depend on thrift definitions. I have been following this handy guide which has been my bible for the past few days: https://erikvanoosten.github.io/thrift-missing-specification/
I almost have my parser working, but there is a string type that throws a wrench into the program, and I don't quite understand what it's doing. Here is an excerpt of the hexdump, which I did my best to annotate:
Correctly parsing:
000001a0 0a 32 30 32 31 2d 31 31 2d 32 34 16 02 00 18 07 |.2021-11-24.....|
........................blah blah blah............| | |
Object End-| | |
0x18 & 0xF = 0x8 = Binary-| |
The binary sequence is 0x7 characters long-|
000001b0 53 65 61 74 74 6c 65 18 02 55 53 18 02 55 53 18 |Seattle..US..US.|
S E A T T L E |___| U S |___| U S
Another string, 2 bytes long |------------|
So far so good.
But then I get to this point:
There string I am trying to extract is "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4592.0 Safari/537.36 Edg/94.0.975.1"
and is 134 bytes long.
000001c0 09 54 61 68 6f 65 2c 20 43 41 12 12 00 00 08 c8 |.Tahoe, CA......|
Object ends here-| | |
0x8 & 0xF = 0x8 = Binary -| |
0xc8 bytes long (200)-|
000001d0 01 86 01 4d 6f 7a 69 6c 6c 61 2f 35 2e 30 20 28 |...Mozilla/5.0 (|
| | | M o z i l l a
???? |--|-134, encoded as var-int
000001e0 4d 61 63 69 6e 74 6f 73 68 3b 20 49 6e 74 65 6c |Macintosh; Intel|
As you can see, I have a byte sequence 0x08 0xC8 0x01 0x86 0x01
which contains the length of the string I'm looking for, is followed by the string I'm looking for but has 3 extra bytes that are unclear in purpose.
The 0x01
is especially confusing as it neither a type identifier, nor seems to have a concrete value.
What am I missing?
The byte sequence reads as follows
0x08
: String type, the next 2 bytes define the elementId0xC8 0x01
: ElementId, encoded in 16 bits0x86 0x01
: String length, encoded as var intIt turns out that if the type identifier does not contain bits defining the elementId, the elementId will be stored in the next 2 bytes.