Search code examples
javascriptserializationhexdeserializationv8

Why V8 (a javascript module) deserializes two 2 different hex strings but brings 2 identical objects?


I've trying to figure out why this happens.

const v8 = require('v8')

let stringa = 'ff0d6f220762616c616e63655a200070824fdc89df747141410000000000220a6465706c6f7965644279220673797374656d220773746f726167656f220a63616e646964617465736f222c49364e4d6f2b4b5a3531634b2b39626f6543554f716e6570724361723566727752746771746c493350596b3d6f2205626c6f636b4e000000000000244022076465706f7369745a20000088b116afe3b5020000000000000022046e616d65220d4e65772056616c696461746f7222086f70657261746f72222b74676c6331613979667337653765763270376a6164396b71757774767870637a6e6e65633432327564726b7b04222c576b5a6c346470454666797436585a506662433270766f5777424d6a314f497161794958304b47714a654d3d6f2205626c6f636b4e000000000000000022076465706f7369745a20000088b116afe3b5020000000000000022046e616d65221147656e657369732056616c696461746f7222086f70657261746f72222b74676c6331613979667337653765763270376a6164396b71757774767870637a6e6e65633432327564726b2206766f746572736f222b74676c6331613979667337653765763270376a6164396b71757774767870637a6e6e65633432327564726b5a200000e8c69583abe311000000000000007b017b057b02220877697468647261776f222b74676c6331613979667337653765763270376a6164396b71757774767870637a6e6e65633432327564726b6f49005a100000dcce86b42ad07b017b017b02220673797374656d547b04'
let a = v8.deserialize(Buffer.from(stringa, 'hex'))
console.dir(a, {depth: null})

let stringb = 'ff0d6f220762616c616e63655a200070824fdc89df747141410000000000220a6465706c6f7965644279220673797374656d220773746f726167656f220a63616e646964617465736f222c49364e4d6f2b4b5a3531634b2b39626f6543554f716e6570724361723566727752746771746c493350596b3d6f2205626c6f636b491422076465706f7369745a20000088b116afe3b5020000000000000022046e616d65220d4e65772056616c696461746f7222086f70657261746f72222b74676c6331613979667337653765763270376a6164396b71757774767870637a6e6e65633432327564726b7b04222c576b5a6c346470454666797436585a506662433270766f5777424d6a314f497161794958304b47714a654d3d6f2205626c6f636b490022076465706f7369745a20000088b116afe3b5020000000000000022046e616d65221147656e657369732056616c696461746f7222086f70657261746f72222b74676c6331613979667337653765763270376a6164396b71757774767870637a6e6e65633432327564726b2206766f746572736f222b74676c6331613979667337653765763270376a6164396b71757774767870637a6e6e65633432327564726b5a200000e8c69583abe311000000000000007b017b057b02220877697468647261776f222b74676c6331613979667337653765763270376a6164396b71757774767870637a6e6e65633432327564726b6f49005a100000dcce86b42ad07b017b017b02220673797374656d547b04'
let b = v8.deserialize(Buffer.from(stringb, 'hex'))
console.dir(b, {depth: null})

It's clearly different between stringa and stringb. But when I print two objects a and b. I realize that they are the same.

Could anyone please explain to me why? Thank you all very much in advance.


Solution

  • With the help of the SerializationTag definition, it's not that hard to decode these strings manually, e.g.:

    ff 0d // non-legacy version: 13
    6f // object start
    22 07 // one-byte string, length=7
    62 61 6c 61 6e 63 65 // "balance"
    5a 20 // bigint, bitfield: sign=0 length=16
    00 70 82 4f dc 89 df 74 71 41 41 00 00 00 00 00 
    22 0a // one-byte string, length=10
    64 65 70 6c 6f 79 65 64 42 79 // "deployedBy"
    22 06 // one-byte string, length=6
    73 79 73 74 65 6d // "system"
    ...
    

    and when you do that long enough, you'll eventually see that stringa uses:

    22 05 // one-byte string, length=5
    62 6c 6f 63 6b // "block"
    4e // double
    0000000000002440 // little-endian encoding of "10.0" as an IEEE754 double
    

    whereas stringb uses:

    22 05 // one-byte string, length=5
    62 6c 6f 63 6b // "block"
    49 // int32
    14 // "zig-zag" encoding of "10"
    

    So when you deserialize these strings to JavaScript objects, then of course the integer 10 and the double 10.0 are indistinguishable, because in JS, both are Numbers.


    Taking a step back: regardless of the specific explanation here, it's not a good idea to rely on specific behavior of a serialization format you don't control. It could change unpredictably. What exactly V8's serialize/deserialize API does under the hood is an internal implementation detail (which is also why there's no documentation about it; you have to read the source to figure it out) that can change. And, in fact, it does change! This string has serialization format version 13, which implies that there were 12 other versions before that, and a new version could be introduced any day.

    Even aside from new serialization format versions, I can think of several additional reasons why different encodings could deserialize to identical-looking JS objects (e.g. string encodings, NaN patterns, leading zeros in varints or BigInt data, optional ignorable bytes, ...).

    If you need to make any guarantees about the serialization format of your objects, you should implement your own serialization algorithm, so that you can make sure that these guarantees actually hold.