Search code examples
pythonbase64asciinon-ascii-characters

Determining the Encoding of a String in Python


I'm new to the site, so please let me know if I need to change anything about this question! Likewise, I'm rather inexperienced with base 64 in general, so please bear with me!

In Python, I have a short program that simply decodes a base 64 string:

import base64

def decodeBase64(string):

    decodeableString = string

    for value in range(len(string)%4):
        decodeableString += '='

    return base64.b64decode(decodeableString)

When trying to decode:

0J3QuNC20LUg0L/RgNC40LLQtdC00LXQvSDQutC+0LQg0LTQvtGB0YLRg9C/0LAg0Log0LfQtNCw0L3QuNGOIFvQo9CU0JDQm9CV0J3Qnl06Ck9WSzhZTFggLyAo0JjQnNCvIC8g0JrQm9Cu0KcpCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09ID09PT09PQrQkdCw0LfQsCAzNg==

as part of a challenge, I encountered Russian characters, which this didn't know how to approach, so it just returned:

b'\xd0\x9d\xd0\xb8\xd0\xb6\xd0\xb5 \xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd0\xb4\xd0\xb5\xd0\xbd \xd0\xba\xd0\xbe\xd0\xb4 \xd0\xb4\xd0\xbe\xd1\x81\xd1\x82\xd1\x83\xd0\xbf\xd0\xb0 \xd0\xba \xd0\xb7\xd0\xb4\xd0\xb0\xd0\xbd\xd0\xb8\xd1\x8e [\xd0\xa3\xd0\x94\xd0\x90\xd0\x9b\xd0\x95\xd0\x9d\xd0\x9e]:\nOVK8YLX / (\xd0\x98\xd0\x9c\xd0\xaf / \xd0\x9a\xd0\x9b\xd0\xae\xd0\xa7)\n================================================== ======\n\xd0\x91\xd0\xb0\xd0\xb7\xd0\xb0 36'

Using a different decoder online, I learned this contains Russian characters. Is there any relatively simple way to have my program check if a decoded base 64 string contains non-ascii characters, and then translates it as such?


Solution

  • In your particular case the string is UTF-8 encoded.

    In Python 3.x you have to decode it from bytes to str, assuming the decoded bytes are in x:

    >>> x.decode('utf-8')
    'Ниже приведен код доступа к зданию [УДАЛЕНО]:\nOVK8YLX / (ИМЯ / КЛЮЧ)\n================================================== ======\nБаза 36'
    

    However in general case, you can only guess the encoding. See this and related questions.