I have a file that I want to convert into custom base (base 86 for example, with custom alphabet)
I have try to convert the file with hexlify and then into my custom base but it's too slow... 8 second for 60 Ko..
def HexToBase(Hexa, AlphabetList, OccurList, threshold=10):
number = int(Hexa,16) #base 16 vers base 10
alphabet = GetAlphabet(AlphabetList, OccurList, threshold)
#GetAlphabet return a list of all chars that occurs more than threshold times
b_nbr = len(alphabet) #get the base
out = ''
while number > 0:
out = alphabet[(number % b_nbr)] + out
number = number // b_nbr
return out
file = open("File.jpg","rb")
binary_data = file.read()
HexToBase(binascii.hexlify(binary_data),['a','b'],[23,54])
So, could anyone help me to find the right solution ?
Sorry for my poor English I'm French, and Thank's for your help !
First you can replace:
int(binascii.hexlify(binary_data), 16) # timeit: 14.349809918712538
By:
int.from_bytes(binary_data, byteorder='little') # timeit: 3.3330371951720164
Second you can use the divmod
function to speed up the loop:
out = ""
while number > 0:
number, m = divmod(number, b_nbr)
out = alphabet[m] + out
# timeit: 3.8345545611298126 vs 7.472579440019706
For divmod
vs %, //
comparison and large numbers, see Is divmod() faster than using the % and // operators?.
(Remark: I expected that buildind an array and then making a string with "".join
would be faster than out = ... + out
but that was not the case with CPython 3.6.)
Everything put together gave me a speed up factor of 6.