Search code examples
pythonpython-3.xradix

Python: How do I convert file to custom base number and back?


I have a file that I want to convert into custom base (base 86 for example, with custom alphabet)

I have try to convert the file with hexlify and then into my custom base but it's too slow... 8 second for 60 Ko..

def HexToBase(Hexa, AlphabetList, OccurList, threshold=10):
    number = int(Hexa,16) #base 16 vers base 10
    alphabet = GetAlphabet(AlphabetList, OccurList, threshold)
    #GetAlphabet return a list of all chars that occurs more than threshold times

    b_nbr = len(alphabet) #get the base
    out = ''
    while number > 0:
        out = alphabet[(number % b_nbr)] + out
        number = number // b_nbr
    return out

file = open("File.jpg","rb")
binary_data = file.read()
HexToBase(binascii.hexlify(binary_data),['a','b'],[23,54])

So, could anyone help me to find the right solution ?

Sorry for my poor English I'm French, and Thank's for your help !


Solution

  • First you can replace:

    int(binascii.hexlify(binary_data), 16) # timeit: 14.349809918712538
    

    By:

    int.from_bytes(binary_data, byteorder='little') # timeit: 3.3330371951720164
    

    Second you can use the divmod function to speed up the loop:

    out = ""
    while number > 0:
        number, m = divmod(number, b_nbr)
        out = alphabet[m] + out
    
    # timeit: 3.8345545611298126 vs 7.472579440019706
    

    For divmod vs %, // comparison and large numbers, see Is divmod() faster than using the % and // operators?.

    (Remark: I expected that buildind an array and then making a string with "".join would be faster than out = ... + out but that was not the case with CPython 3.6.)

    Everything put together gave me a speed up factor of 6.