Search code examples
hexthriftdecodingthrift-protocol

Decoding Thrift Object what are these extra bytes?


I'm working on writing a pure JS thrift decoder that doesn't depend on thrift definitions. I have been following this handy guide which has been my bible for the past few days: https://erikvanoosten.github.io/thrift-missing-specification/

I almost have my parser working, but there is a string type that throws a wrench into the program, and I don't quite understand what it's doing. Here is an excerpt of the hexdump, which I did my best to annotate:

Correctly parsing:

000001a0  0a 32 30 32 31 2d 31 31  2d 32 34 16 02 00 18 07  |.2021-11-24.....|
........................blah blah blah............|  |  |
                                       Object End-|  |  |
                           0x18 & 0xF = 0x8 = Binary-|  |
             The binary sequence is 0x7 characters long-|
000001b0  53 65 61 74 74 6c 65 18  02 55 53 18 02 55 53 18  |Seattle..US..US.|
          S  E  A  T  T  L  E  |___|  U  S  |___| U  S
    Another string, 2 bytes long |------------|

So far so good.

But then I get to this point: There string I am trying to extract is "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4592.0 Safari/537.36 Edg/94.0.975.1" and is 134 bytes long.

000001c0  09 54 61 68 6f 65 2c 20  43 41 12 12 00 00 08 c8  |.Tahoe, CA......|
                                 Object ends here-|  |  |
                           0x8 & 0xF = 0x8 = Binary -|  |
                                  0xc8 bytes long (200)-|
000001d0  01 86 01 4d 6f 7a 69 6c  6c 61 2f 35 2e 30 20 28  |...Mozilla/5.0 (|
          |  |  |  M  o  z  i  l   l  a  
        ???? |--|-134, encoded as var-int
000001e0  4d 61 63 69 6e 74 6f 73  68 3b 20 49 6e 74 65 6c  |Macintosh; Intel|

As you can see, I have a byte sequence 0x08 0xC8 0x01 0x86 0x01 which contains the length of the string I'm looking for, is followed by the string I'm looking for but has 3 extra bytes that are unclear in purpose.

The 0x01 is especially confusing as it neither a type identifier, nor seems to have a concrete value.

What am I missing?


Solution

  • The byte sequence reads as follows

    • 0x08: String type, the next 2 bytes define the elementId
    • 0xC8 0x01: ElementId, encoded in 16 bits
    • 0x86 0x01: String length, encoded as var int

    It turns out that if the type identifier does not contain bits defining the elementId, the elementId will be stored in the next 2 bytes.