Search code examples
pythonarrayscompressionhuffman-code

How to Convert Byte Array back into string?


I am writing a Huffman Coding program. So far I have only written the compression part: as expected it takes the text I want to compress, creates a code for each character and replaces each character with its respective code. This is my compressed text in a string format - I convert this string into a byte array using the following code:

def make_byte_array(self, padded_text):

        byte_array = bytearray()
        for i in range(0, len(padded_text), 8):
            byte_array.append(int(padded_text[i:i + 8], 2))
        
        return byte_array

I then save the byte_array into a .bin file by doing bytes(byte_array). I want to now be able to open this binary file, read the byte_array inside and turn it back into the string format of my compressed text in order to be able to decompress it. The problem is whenever I open and read this binary file, I get something like this:

b'\xad"\xfdK\xa8w\xc1\xec\xcb\xe5)\x1f\x1f\x92'

How would I go about converting this back into the string format of my compressed text?


Solution

  • If s is that byte string:

    for x in s:
        print(f'{x:08b}')
    

    Instead of print, you can do what you like with the strings of 0's and 1's.

    It is unnecessarily inefficient to go through strings of 0 and 1 characters for encoding and decoding. You should instead assemble and disassemble the bytes directly using the bit operators (<<, >>, |, &).