Search code examples
pythonbinaryhexendianness

Convert binary to signed, little endian 16bit integer in Python


Trying to a convert a binary list into a signed 16bit little endian integer

input_data = [['1100110111111011','1101111011111111','0010101000000011'],['1100111111111011','1101100111111111','0010110100000011']]
Desired Output =[[-1074, -34, 810],[-1703, -39, 813]]

This is what I've got so far. It's been adapted from: Hex string to signed int in Python 3.2?, Conversion from HEX to SIGNED DEC in python

results = []
for i in input_data:
   hex_convert = [hex(int(x,2)) for x in i]
   convert = [int(y[4:6] + y[2:4], 16) for y in hex_convert]
   results.append(convert)
print (results)
output: [[64461, 65502, 810], [64463, 65497, 813]]

This is works fine, but the above are unsigned integers. I need signed integers capable of handling negative values. I then tried a different approach:

results_2 = []
for i in input_data:
   hex_convert = [hex(int(x,2)) for x in i]
   to_bytes = [bytes(j, 'utf-8') for j in hex_convert]
   split_bits = [int(k, 16) for k in to_bytes]
   convert_2 = [int.from_bytes(b, byteorder = 'little', signed = True) for b in to_bytes]
   results_2.append(convert_2)
print (results_2)
Output: [[108191910426672, 112589973780528, 56282882144304], [108191943981104, 112589235583024, 56282932475952]]

This result is even more wild than the first. I know my approach is wrong, and it doesn't help that i've never been able to get my head around binary conversion etc, but I feel i'm on the right path with:

(b, byteorder = 'little', signed = True)

but can't work out where i'm wrong. Any help explaining this concept would be greatly appreciated.


Solution

  • This result is even more wild than the first. I know my approach is wrong... but can't work out where i'm wrong.

    The problem is in the conversion to bytes. Let's look at it a step at a time:

    int(x, 2)
    

    Fine; we treat the string as a base-2 representation of the integer value, and get that integer. Only problem is it's a) unsigned and b) big-endian.

    hex(int(x,2))
    

    What this does is create a string representation of the integer, in base 16, with a 0x prefix. Notably, there are two text characters per byte that we want. This is already heading is down the wrong path.

    You might have thought of using hexadecimal because you've seen \xAB style escapes inside string representations. This is a completely different thing. The string '\xAB' contains one character. The string '0xAB' contains four.

    From there, everything else is still nonsense. Converting to bytes with a text encoding just means that the text character 0 for example is replaced with the byte value 48 (since in UTF-8 it's encoded with a single byte with that value). For this data we get the same results with UTF-8 that we would by assuming plain ASCII (since UTF-8 is "ASCII transparent" and there are no non-ASCII characters in the text).


    So how do we do it?

    We want to convert the integer from the first step into the bytes used to represent it. Just as there is a .from_bytes class method allowing us to create an integer from underlying bytes, there is an instance method allowing us to get the bytes that would represent the integer.

    So, we use .to_bytes, specifying the length, signedness and endianness that was assumed when we created the int from the binary string - that gives us bytes that correspond to that string. Then, we re-create the integer from those bytes, except now specifying the proper signedness and endianness. The reason that .to_bytes makes us specify a length is because the integer doesn't have a particular length - there are a minimum number of bytes required to represent it, but you could use as many more as you like. (This is especially important if you want to handle signed values, since it will do sign-extension automatically.)

    Thus:

    for i in input_data:
        values = [int(x,2) for x in i]
        as_bytes = [x.to_bytes(2, byteorder='big', signed=False) for x in values]
        reinterpreted = [int.from_bytes(x, byteorder='little', signed=True) for x in as_bytes]
        results_2.append(reinterpreted)
    

    But let's improve the organization of the code a bit. I will first make a function to handle a single integer value, and then we can use comprehensions to process the list. In fact, we can use nested comprehensions for the nested list.

    def as_signed_little(binary_str):
        # This time, taking advantage of positional args and default values.
        as_bytes = int(binary_str, 2).to_bytes(2, 'big')
        return int.from_bytes(as_bytes, 'little', signed=True)
    
    # And now we can do:
    results_2 = [[as_signed_little(x) for x in i] for i in input_data]