Search code examples
rubyhexbyteendiannesspack

Ruby pack and unpack hex value does not return the same value?


I have a hex string of unknown (variable) length and I want to pack or unpack at any time to convert to bytes.

["a"].pack("H*")
# => "\xA0"

I'm getting \xA0 -- is that because it is little-endian? I was expecting \xA or \x0A.

In the same manner I'm also getting a0 hex string if unpacking again, i.e.,:

["a"].pack("H*").unpack("H*").first
=> "a0"

Again, I was expecting a or 0a, so I'm a bit confused. Is this all the same?

I would prefer big-endian for hex strings but it appears that .pack does not accept endianness for H:

["a"].pack("H>*").unpack("H>*")
ArgumentError: '<' allowed only after types sSiIlLqQjJ (ArgumentError)
from <internal:pack>:8:in `pack'

How can I get a big-endian hex values from unpack?


Solution

  • Let's collect some facts.


    First, from How to identify the endianness of given hex string? :

    "Bytes don't have endianness." – @MichaelPetch

    – @VC.One

    You only get endianness once you start stringing bytes together. So that's not at issue here.


    Next, from What does ["string"].pack('H*') mean? :

    [Array.pack] interprets the string as hex numbers, two characters per byte, and converts it to a string with the characters with the corresponding ASCII code.

    So your string "a", being one character, doesn't describe even one full byte.


    Finally, from the packed_data docs :

    ['fff'].pack('h3') # => "\xFF\x0F"
    ['fff'].pack('h4') # => "\xFF\x0F"
    ['fff'].pack('h5') # => "\xFF\x0F\x00"
    ['fff'].pack('h6') # => "\xFF\x0F\x00"
    

    This shows that input strings that are

    1. an odd number of characters, or
    2. shorter than the length specified in the pattern,

    are treated as though they were right-padded with 0.


    Putting all this together, it becomes clear that what's happening is that Array.pack is, in effect, padding your too-short input string "a" with a 0 on the right so that it can work with it at all, and everything treats the input as the string "a0" from there.

    If you're not satisfied with that behavior, the one lever you can pull is to swap H* for h*, which according to the docs trades "high nibble first" for "low nibble first."

    Here's an illustration of the effects of that change. (I'll use f instead of a, because \x0A gets rewritten as \n, making the effect harder to see.)

    # Determines how order of nibbles ("half-bytes") is interpreted
    ["f0"].pack("H*") # => "\xF0"
    ["f0"].pack("h*") # => "\x0F"
    ["0f"].pack("H*") # => "\x0F"
    ["0f"].pack("h*") # => "\xF0"
    
    # Always right-pads input ("f" matches behavior of "f0…", never "…0f")
    ["f"].pack("H*") # => "\xF0"
    ["f"].pack("h*") # => "\x0F"
    ["f"].pack("H4") # => "\xF0\x00"
    ["f"].pack("h4") # => "\x0F\x00"
    
    # Changes nothing in the round-trip conversion
    ["f0"].pack("H*").unpack("H*") # => ["f0"]
    ["f0"].pack("h*").unpack("h*") # => ["f0"]
    ["0f"].pack("H*").unpack("H*") # => ["0f"]
    ["0f"].pack("h*").unpack("h*") # => ["0f"]
    

    It seems like this nibble-ordering is what you had in mind when you asked about endianness, so I hope this helps. However, note that whichever nibble order you choose, a 1-character input string will always be right-padded with a zero, never left-padded.