python algorithm math encryption unicode

Python unicode encryption algorithim

I am building an encryption algorithm. It firstly takes the message you want to encrypt and an encryption key which can be more than 8 digits. It takes the message, using 'for' in Python it strips apart each char and puts them in a variable called 'char'. Now, it checks whether the ord(char) + encryption key is odd or even. If it's even then it subtracts the ord(char) by the key. If it's odd then it adds. it converts these digits to char by chr() and after every character being passed by this process, it displays the output to the user

Now, you might have noticed if let's say the result of ord(char) + key was odd, and the number was more than 1114111 (that's the number of usable unicode points), In that case it uses this algo-

            checkpoint = key - (1114111 - ord(char))
            if checkpoint < 1114111:
               mes += chr(checkpoint)
            else:
                mes += chr(checkpoint % 1114111)

and in if ord(char) + key was even and if key was more than ord(char), then it will use this-

            checkpoint = key - (1114111 - ord(char))
            if checkpoint < 1114111:
               mes += chr(checkpoint)
            else:
                mes += chr(checkpoint % 1114111)

The problem with my program is that when I use decryption, and the ord(char) + key is odd and more than 1114111, it simply gives me the wrong decryption. My tests indicate this and I can't figure how do I correct this problem. I will feel extremely grateful if you can correct my code or if you find out that there is a different twist to this problem or a new approach. I have posted the full code of both the programs (encryption and decryption)-

# encryption
code = str(input('Write your message, \t'))
key = int(input('Write your encryption key, use only numbers, \t'))
mes = ''
for char in code:
    if (ord(char) + key) % 2 != 0:
        if (ord(char) + key) <= 1114111:
            mes += chr(ord(char) + key)
        else:
            checkpoint = key - (1114111 - ord(char))
            if checkpoint < 1114111:
               mes += chr(checkpoint)
            else:
                mes += chr(checkpoint % 1114111)
    else:
        if ord(char) > key:
           mes += chr(ord(char) - key)
        else:
c
print(f'this is your encrypted messsage-   {mes}')

# decryption
code = str(input('Write your encrypted message, \t'))
key = int(input('Write your encryption key, \t'))
mes = ''
for char in code:
    if (ord(char) % 2) == (key % 2):
       if (ord(char) + key) % 2 != 0:
          if (ord(char) + key) <= 1114111:
             mes += chr(ord(char) + key)
          else:
            checkpoint = key - (1114111 - ord(char))
            if checkpoint < 1114111:
               mes += chr(checkpoint)
            else:
                mes += chr(checkpoint % 1114111)
       else:
           if ord(char) > key:
              mes += chr(ord(char) - key)
           else:
               checkpost = (key - ord(char))
               if checkpost < 1114111:
                  mes += chr(checkpost)
               else:
                   mes += chr(key % 1114111)
    else:
       if (ord(char) + key) % 2 == 0:
          if (ord(char) + key) <= 1114111:
             mes += chr(ord(char) + key)
          else:
              checkpoint = key - (1114111 - ord(char))
              if checkpoint < 1114111:
                 mes += chr(checkpoint)
              else:
                  mes += chr(checkpoint % 1114111)
       else:
           if ord(char) > key:
              mes += chr(ord(char) - key)
           else:
               checkpost = (key - ord(char))
               if checkpost < 1114111:
                  mes += chr(checkpost)
               else:
                   mes += chr(key % 1114111)
print(f'this is your decrypted messsage "{mes}"')

Solution

Don't do that this way.

Just encode your data to bytes first, using a char encoding of your preference (utf-8 will cover all available valid characters).

Then create your encryption algorithm worried with the bytes, ranging from 0 to 256 (or group then in pairs, quadwords, 16byte groups with padding, whatever).

The unicode code space IS NOT LINEAR and is not LINEARLY USABLE as you want it to be. It is not because the highest theoretical codepoint is this number you repeat over and over again (as a side note: that is a terrible practice, that would cost you several points in any interview: when having a constant you need to refer to, keep it in a variable, so you can reach it by name)

Anyway, there are several value ranges that simply won't work when trying ord(num), because they will lead to ilegal, reserved, or undefined characters, and even if you map then to characters with a valid glyph representation, more than 90% of then will be unprintable in the majority of systems, as very few people have installed all the fonts needed to display everything - so if you would like a "copy-pasteable" encrypted string, the point would be missed anyway. And in all cases, when sending your "encrypted" (also pay attention to the comments that encryption is hard, and whatever you are doing there is just a toy), over the network, or store to a file, they will be serialized to bytes nonetheless.

TLDR;

start by encoding your text message to bytes: `real_message = message.encode("utf-8")
If you are to be minimally serious, compress those so as to increase entropy. Pyhton's ziplib does that:

In [28]: a = zlib.compress("Alô mundo!".encode("utf-8"))

In [29]: print(a)
b'x\x9cs\xcc9\xbcE!\xb74/%_\x11\x00\x1d#\x04\x89'

In [30]: print(zlib.decompress(a).decode("utf-8"))
Alô mundo!

(the bytestring is larger than the original in this case due to header structures needed by zlib - it will be smaller for a paragraph sized text)

Now, is your time to shine: be creative, and apply whatever you want to "disguise" those bytes. You can use the bytearray Python built-in to easily store modifiable bytes. You can use Python's struct package to group bytes in 4 or 8 unsigned (or signed) integers, or array.array to do the same.
when you are done, you still have a byte-string. If you want that to be printable, the correct thing to do is to map bytes to the printable range using one of the base64 or base85 (Python's stdlib base64 module contains encoders and decoders for both):

printable_message = base64.b85encode(bytes_encrypted_message).decode("ASCII")

be happy, and play around step 3.

Complete example using a simple "xor" encription (Python ^ operator), without compression:

from itertools import cycle

secret = "santaclausisfat".encode("utf-8")
message = "this is the very secret message"
raw_message = message.encode("utf-8")


encrypted_bytes = bytearray(byte ^ key_byte for byte, key_byte in zip(raw_message, cycle(secret)))

print(encrypted_bytes)

encrypted_text = base64.b64encode(encrypted_bytes).decode()
print(encrypted_text)
# YAY!!

original = bytes(byte ^ key_byte   for byte, key_byte in  zip(base64.b64decode(encrypted_text.encode()), cycle(secret))).decode("utf-8")
print(original)