python string binary encode python-simple-crypt

Transform ordinary string (which is supposed to be a binary string) back into a binary string

So I am currently working with the library: simple-crypt.

I have managed to transform a certain input string into it´s binary string.

        pw_data = input("Please type in your p!")  # enter password
        pw_data_confirmed = input("Please confirm!")
        _platform = input("Please tell me the platform!")  # belonging platform
        if pw_data == pw_data_confirmed:  # check confirmed pw
            print("Received!")

            salt_data = "AbCdEfkhl"  # salt key

            ciphertext = encrypt(salt_data, pw_data.encode("utf8"))  # encrypt pw with salt key

Binary string e.g: b'sc\x00\x02X\xd8\x8ez\xbfB\x03s\xc5\x8bm\xecp\x19\x8d\xd6lqW\xf1\xc3\xa4y\x8f\x1aW)\x9bX\xfc\x0e\xa4\xf2ngJj/]{\x80\x06-\x07\x8cQ\xeef\x0b\x02?\x86\x19\x98\x94eW\x08}\x1d8\xdb\xe57\xf7\x97\x81\xb6\xc7\x08\n^\xc9\xc0'

This binary string will then be stored in a word document.

The problem now is: As soon as I read the document and get this specific binary string, it will not recognize it as binary string anymore. Instead, it is now of data type string.

p_loc = input("Which platform do you need?")
doc_existing = docx.Document(r"xxx")
text = []
for i in doc_existing.paragraphs:
    text.append(i.text)

for pos,i in enumerate(text):
    if i == p_loc:
    len_pos = len(text[pos+1])
    p_code = text[pos+1][2:len_pos-1]  # get the binary string which is of type ordinary string
print(p_code.encode("utf8"))  # when I apply .encode , another \ is added so I have for my binary code two \\


salt_data = "AbCdEfkhl"

plain = decrypt(salt_data, p_code)

print(plain)

p_code without .encode statement (as a string, not bytestring!): sc\x00\x02X\xd8\x8ez\xbfB\x03s\xc5\x8bm\xecp\x19\x8d\xd6lqW\xf1\xc3\xa4y\x8f\x1aW)\x9bX\xfc\x0e\xa4\xf2ngJj/]{\x80\x06-\x07\x8cQ\xeef\x0b\x02?\x86\x19\x98\x94eW\x08}\x1d8\xdb\xe57\xf7\x97\x81\xb6\xc7\x08\n^\xc9\xc0

When I now print out p_code.encode("utf8") I get the following result: b'sc\\x00\\x02X\\xd8\\x8ez\\xbfB\\x03s\\xc5\\x8bm\\xecp\\x19\\x8d\\xd6lqW\\xf1\\xc3\\xa4y\\x8f\\x1aW)\\x9bX\\xfc\\x0e\\xa4\\xf2ngJj/]{\\x80\\x06-\\x07\\x8cQ\\xeef\\x0b\\x02?\\x86\\x19\\x98\\x94eW\\x08}\\x1d8\\xdb\\xe57\\xf7\\x97\\x81\\xb6\\xc7\\x08\\n^\\xc9\\xc0'

So the problem is, if you compare this second binary string with the original binary string, that it added a second \ to it. As a consequence, I am not able to decode this binary string as it won t recognize it as the original binary code string.

So my question is: Is there a casual way to simply transform a string which is already in binary style back into binary string so it is the same? Or is there a way I could remove the second \ so that I have the original binary string again?

I am very grateful for any help!!

Solution

Ok. So when you do f"{ciphertext}" you are telling python to store the string representation of those bytes, as text, in the doc.

E.g.

>>> b = b"\x00\x01\x65\x66"
>>> print(f"{b}")
b'\x00\x01ef'

You (probably) don't really want to store b'\x00\x01ef' in your word doc. A good general way to store binary data in text form is to use a different encoding. Base64 is a commonly used encoding that is intended to store binary data in a text-based form.

See https://docs.python.org/3/library/base64.html for more information.

In your case, you do something like

import base64

cipher_b64_b = base64.b64encode(ciphertext)
cipher_b64 = cipher_b64_b.decode() # cipher_b64 is now a string.
# Now store this cipher_b64 string in your word document

...

# Now you fetch p_code (which is now a base64 string) from your word doc
cipher_b64_b = p_code.encode()
cipher = base64.b64decode(cipher_b64_b)

This results in your original binary ciphertext. The word document will contain a base64 encoded string like "AAFlZg==", which avoids the issues with escape sequences etc in your word document.