I am writing a Huffman Coding program. So far I have only written the compression part: as expected it takes the text I want to compress, creates a code for each character and replaces each character with its respective code. This is my compressed text in a string format - I convert this string into a byte array using the following code:
def make_byte_array(self, padded_text):
byte_array = bytearray()
for i in range(0, len(padded_text), 8):
byte_array.append(int(padded_text[i:i + 8], 2))
return byte_array
I then save the byte_array into a .bin file by doing bytes(byte_array). I want to now be able to open this binary file, read the byte_array inside and turn it back into the string format of my compressed text in order to be able to decompress it. The problem is whenever I open and read this binary file, I get something like this:
b'\xad"\xfdK\xa8w\xc1\xec\xcb\xe5)\x1f\x1f\x92'
How would I go about converting this back into the string format of my compressed text?
If s
is that byte string:
for x in s:
print(f'{x:08b}')
Instead of print
, you can do what you like with the strings of 0's and 1's.
It is unnecessarily inefficient to go through strings of 0 and 1 characters for encoding and decoding. You should instead assemble and disassemble the bytes directly using the bit operators (<<
, >>
, |
, &
).