Search code examples
pythonalgorithmlistbinarystring-concatenation

Python algorithm for converting an integer to binary: questions about efficiency


I made a simple algorithm for converting an integer into binary, and returning the binary as a string. Currently the bits are added to the front of a list. Would it be more efficient (in terms of either time or space) to concatenate strings at each step instead of appending to a list? Why/why not?

def toBinary(n):
    l = []
    while n > 0:
        bit = n%2
        n = math.floor(n/2)
        l = [bit]+l
    return ''.join(map(str,l))

I know an easier way to do the exact same thing would be to use:

bin(10)[2:]

(with the slice just taking off the prefix) but I wanted to try implementing the algorithm myself.

[edit] I see that append() is more efficient than simple concatenation (from this post). What about appending at each step and then reversing the list? Or is there a similarly efficient function for adding to the beginning of the list (like insert())?


Solution

  • When building a string from disparate characters, using a list then str.join() is faster than appending to a string. The latter has to build a new string object for each character.

    That said, you can improve on your version by using bit shifting and bit masking:

    def toBinary(n):
        l = []
        while n:
            l.append(n & 1)
            n >>= 1
        return ''.join(map(str, reversed(l)))
    

    This still builds up the bits one by one from least significant to most, but appends rather than prepends to the list. We then simply reverse the list at the end.

    Of course, the most efficient way to convert an integer to binary in Python is to not do it in Python. The built-in bin() function is nice, but the format() function is nicest, as it lets you specify wether or not to include the 0b prefix, and specific the number of digits to display:

    >>> format(10, 'b')
    '1010'
    >>> format(10, '#b')
    '0b1010'
    >>> format(10, '08b')
    '00001010'
    >>> format(10, '#010b')
    '0b00001010'