python - Steganography - UnicodeDecode Error

I am writing a Python script to hide data in an image. It basically hides the bits in the last two bits of the red color in RGB map of each pixel in a .PNG. The script works fine for lowercase letters but produces an error with a full stop. It produces this error:

Traceback (most recent call last): File "E:\Python\Steganography\main.py", line 65, in print(unhide('coded-img.png')) File "E:\Python\Steganography\main.py", line 60, in unhide message = bin2str(binary) File "E:\Python\Steganography\main.py", line 16, in bin2str return n.to_bytes((n.bit_length() + 7) // 8, 'big').decode() UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 6: invalid start byte

Here, is my code:

from PIL import Image

def str2bin(message):
    binary = bin(int.from_bytes(message.encode('utf-8'), 'big'))
    return binary[2:]

def bin2str(binary):
    n = int(binary, 2)
    return n.to_bytes((n.bit_length() + 7) // 8, 'big').decode()

def hide(filename, message):
    image = Image.open(filename)
    binary = str2bin(message) + '00000000'

    data = list(image.getdata())

    newData = []

    index = 0
    for pixel in data:
        if index < len(binary):
            pixel = list(pixel)
            pixel[0] >>= 2
            pixel[0] <<= 2
            pixel[0] += int('0b' + binary[index:index+2], 2)
            pixel = tuple(pixel)
            index += 2

        newData.append(pixel)

    print(binary)

    image.putdata(newData)
    image.save('coded-'+filename, 'PNG')

def unhide(filename):
    image = Image.open(filename)
    data = image.getdata()

    binary = '0'

    index = 0

    while binary[-8:] != '00000000':
        binary += bin(data[index][0])[-2:]
        index += 1

    binary = binary[:-1]

    print(binary)
    print(index*2)

    message = bin2str(binary)
    return message


hide('img.png', 'alpha.')
print(unhide('coded-img.png'))

Please help. Thanks!

Solution

There are at least two problems with your code.

The first problem is that your encoding can be misaligned by 1 bit since leading null bits are not included in the output of the bin() function:

>>> bin(int.from_bytes('a'.encode('utf-8'), 'big'))[2:]
'1100001'
# This string is of odd length and (when joined with the terminating '00000000')
# turns into a still odd-length '110000100000000' which is then handled by your
# code as if there was an extra trailing zero (so that the length is even).
# During decoding you compensate for that effect with the
#
#       binary = binary[:-1]
#
# line. The latter is responsible for your stated problem when the binary
# representation of your string is in fact of even length and doesn't need
# the extra bit as in the below example:
>>> bin(int.from_bytes('.'.encode('utf-8'), 'big'))[2:]
'101110'

You better complement your binary string to even length by prepending an extra null bit (if needed).

The other problem is that while restoring the hidden message the stopping condition binary[-8:] == '00000000' can be prematurely satisfied through leading bits of one (partially restored) symbol being concatenated to trailing bits of another symbol. This can happen, for example, in the following cases

the symbol @ (with ASCII code=64, i.e. 6 low order bits unset) followed by any character having an ASCII code value less than 64 (i.e. with 2 highest order bits unset);
a space character (with ASCII code=32, i.e. 4 low order bits unset) followed by a linefeed/newline character(with ASCII code=10, i.e. 4 high order bits unset).

You can fix that bug by requiring that a full byte is decoded at the time when the last 8 bits appear to all be unset:

while not (len(binary) % 8 == 0 and binary[-8:] == '00000000'):
    # ...