Search code examples
pythonstringencryptioncryptographynon-ascii-characters

Python: Converting a string to the octet format


I am trying to implement the OS2IP algorithm in Python. However I do not know how I can convert a character string, say "Men of few words are the best men." into the octet format.


Solution

  • Use the .encode() method of str. For example:

    "öä and ü".encode("utf-8")
    

    displays

    b'\xc3\xb6\xc3\xa4 and \xc3\xbc'
    

    If you then want to convert this to an int, you can just use the int.from_bytes() method, e.g.

    the_bytes = "öä and ü".encode("utf-8")
    the_int = int.from_bytes(the_bytes, 'big')
    print(the_int)
    

    displays

    236603614466389086088250300
    

    In preparing for an RSA encryption, a padding algorithm is typically applied to the result of the first encoding step to pad the byte array out to the size of the RSA modulus, and then padded byte array is converted to an integer. This padding step is critical to the security of RSA cryptography.