Search code examples
perlencodingcharacter-encodingdecodehexdump

I have data in hex dump but don't know the encoding. Eg. 0x91 0x05 = 657


I have some data in hexdump code. left hand are DEC and right hand are hexdump code.

16 = 10
51 = 33
164 = A4 01
388 = 84 03
570 = BA 04
657 = 91 05
1025 = 81 08
246172 = 9C 83 0F

How to calculate any hexdump to DEC ? In perl, I tried to use ord() command but don't work.

Update I don't known what it call. It look like 7bits data. I try to build formula in excel look like these:

DEC = hex2dec(X) + (128^1 * hex2dec(Y-1)) + (128^2 * hex2dec(Z-1)) + ...

Solution

  • What you have is a variable-length encoding. The length is encoded using a form of sentinel value: Each byte of the encoded number except the last has its high bit set. The remaining bits form the two's-complement encoding of the number in little-ending byte order.

    0xxxxxxx                   ⇒                   0xxxxxxx
    1xxxxxxx 0yyyyyyy          ⇒          00yyyyyy yxxxxxxx
    1xxxxxxx 1yyyyyyy 0zzzzzzz ⇒ 000zzzzz zzyyyyyy yxxxxxxx
    etc
    

    The following can be used to decode a stream:

    use strict;
    use warnings;
    use feature qw( say );
    
    sub extract_first_num {
       $_[0] =~ s/^([\x80-\xFF]*[\x00-\x7F])//
          or return;
    
       my $encoded_num = $1;
       my $num = 0;
       for (reverse unpack 'C*', $encoded_num) {
          $num = ( $num << 7 ) | ( $_ & 0x7F );
       }
    
       return $num;
    }
    
    my $stream_buf = "\x10\x33\xA4\x01\x84\x03\xBA\x04\x91\x05\x81\x08\x9C\x83\x0F";
    while ( my ($num) = extract_first_num($stream_buf) ) {
       say $num;
    }
    
    die("Bad data") if length($stream_buf);
    

    Output:

    16
    51
    164
    388
    570
    657
    1025
    246172