Search code examples
pythonpython-2.7python-3.xtextbinary

Binary to String/Text in Python


I have searched many times online and I have not been able to find a way to convert my binary string variable, X

X = "1000100100010110001101000001101010110011001010100"

into a UTF-8 string value.

I have found that some people are using methods such as

b'message'.decode('utf-8')

however, this method has not worked for me, as 'b' is said to be nonexistent, and I am not sure how to replace the 'message' with a variable. Not only, but I have not been able to comprehend how this method works. Is there a better alternative?

So how could I convert a binary string into a text string?

EDIT: I also do not mind ASCII decoding

CLARIFICATION: Here is specifically what I would like to happen.

def binaryToText(z):
    # Some code to convert binary to text
    return (something here);
X="0110100001101001"
print binaryToText(X)

This would then yield the string...

hi

Solution

  • It looks like you are trying to decode ASCII characters from a binary string representation (bit string) of each character.

    You can take each block of eight characters (a byte), convert that to an integer, and then convert that to a character with chr():

    >>> X = "0110100001101001"
    >>> print(chr(int(X[:8], 2)))
    h
    >>> print(chr(int(X[8:], 2)))
    i
    

    Assuming that the values encoded in the string are ASCII this will give you the characters. You can generalise it like this:

    def decode_binary_string(s):
        return ''.join(chr(int(s[i*8:i*8+8],2)) for i in range(len(s)//8))
    
    >>> decode_binary_string(X)
    hi
    

    If you want to keep it in the original encoding you don't need to decode any further. Usually you would convert the incoming string into a Python unicode string and that can be done like this (Python 2):

    def decode_binary_string(s, encoding='UTF-8'):
        byte_string = ''.join(chr(int(s[i*8:i*8+8],2)) for i in range(len(s)//8))
        return byte_string.decode(encoding)