Search code examples
pythonalgorithmpython-3.xinteger-overflowpow

Convert a very large base n number to bytes


I have a very large number in base n (n is specified by user), stored as an array with each element representing a digit. u[0] is the highest digit, u[1] is the second highest, u[-1] is the lowest digit and so on. Leading zeros are understood to be meaningless: For instance, if n is 8, [0, 0, 0, 4, 7, 3] is equivalent to [4, 7, 3] and both equal (473) in base 8, or 315 in base 10, or 13B in hex, or [1, 59] as a byte array.

I want to convert this into an array of bytes which correspond to base-256 representation of the same number, with minimal leading zeros. I have the following code to do so:

def base_n_to_byte_array(digits, from_base):
    """ Converts a base n number to a byte array.

    :param digits: Digits of the number, starting from highest.
    :param from_base: Base in which the number is given.
    """

    x = 0
    n = len(digits)
    for i in range(0, len(digits)):
        x += digits[i] * int(math.pow(from_base, n - i - 1))

    min_length = max(math.ceil(math.log(x, 256)), 1)
    byte_array = x.to_bytes(min_length, byteorder='big')
    return byte_array

This works for smaller numbers (a few hundred digits). However, it turns out that math.pow is pretty limited, for instance if we use base 8, math.pow(8, 341) is the highest power I can get, and math.pow(8,342) fails with OverflowError: math range error.

I know that the common way of dealing with large numbers is to represent them as floating points - but in this case I am using this code to encode/decode binary files into alternative representations (eg. trytes). Therefore, if due to loss of precision the less significant bytes are altered, a lot of the data will be corrupted, so I can't use an approximate power calculation - I need the result to be exact.

How can I solve this problem? Is there a version of math.pow that doesn't overflow? Is there a more efficient base conversion algorithm that I'm overlooking?


Solution

  • Is there a version of math.pow that doesn't overflow?

    Try using the built-in exponentiation operator, **. AFAIK it doesn't have the same limitations that math.pow does.

    >>> math.pow(8,342)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OverflowError: math range error
    
    >>> 8**342
    719077253944926363091722076315609893447190791576922629093720324630930703222003852530833909289630144084480455519485573430635159075257666489971389722557896497511071573699461941105208878404984376477812331808340023075352602729369851525895652442163308948653402042738345192959788983753918865219341425318496896548864L