Search code examples
pythonradixternary

How do you decode a string in ternary (or any unusual base)?


For binary, quaternery, octonery and hex it's clear for me how to convert a stream of them to plain text. Can anybody help me understand how should I do it for ternary, quinary, senary and other bases? (Here's a ternary example, a python script would be appreciated) So far I've tried:

  • Decoding every number to it's binary counterpart, 0->00, 1->01 and 2->10
  • Creating chunks of 3 characters and map them to English alphabets which didn't work either: 000->a,001->b and so on till 221->z

Here's my code:

from numpy import *
import binascii

base =3
base_data = ''
with open ("./base%s"%base,'r') as b3:
    for line in b3:
        base_data = base_data + line.strip('\r\n')
output = []
all_nums_in_base = range(base)
list_chars = list(base_data)
final = ''
for char in list_chars:
    if char == '0':
        output += ['0']
    elif char == '1':
        output += ['1']
    elif char == '2':
        output += ['1','0']
output = ''.join(output)
n = int('ob'+output,2)
print binascii.unhexlify('%x' % n)

and my result is in this format:

JMN4,�J�j�T*2VYI�F�%��TjYCL���Y�E�&�
�I��̚dYCL�Z�
�K*�թ��-P��Qie�K"q�jL��5j�Y���K0�C�K2i�f�

Solution

  • With your example data (abbreviated here):

    > s = '''010020010202011000010200011010011001010202001012...'''
    > ''.join(chr(int(s[i:i+6], 3)) for i in range(0, len(s), 6))
    => 'Welcome to base 3!\nLorem ipsum dolor sit amet, consectetur ...'
    

    I guessed that it encodes each character in six ternary digits because your example data's length is a multiple of 6 and 36 is the smallest power of 3 larger than or equal to 28.