Search code examples
pythonencodingbinary

Is there a way to get a binary number array without converting it to a string first?


I am working on a problem where I need to manipulate binary data. The easiest way for me to do this would be through arrays as the use of binary string representation is not allowed. I'm able to derive this code or do the same thing using bin() but the numbers will always be converted into a string. I need to use an array of type int or bool. How can this be done?

ascii = list("ABCD".encode('ascii'))
arr = list(map(lambda x: [*format(x, '08b')], ascii))

I tried using bin() and format() to get binary strings in python but got string results.


Solution

    • The usage of strings for binary data is not the best idea since each character in a string typically takes more than one byte.

    List comprehension: (@Mark in the comment)

    • Since you don't seem to have many operations, a list of list would be fine to save the bits:
    ascii = list("ABCD".encode('ascii'))
    res = [[(byte >> (7 - i)) & 1 for i in range(8)] for byte in ascii]
    print(res)
    

    Prints

    [[0, 1, 0, 0, 0, 0, 0, 1], 
    [0, 1, 0, 0, 0, 0, 1, 0], 
    [0, 1, 0, 0, 0, 0, 1, 1], 
    [0, 1, 0, 0, 0, 1, 0, 0]]
    

    • You could use numpy.array() and save the bits, if you had numerous operations:
    import numpy as np
    
    ascii = list("ABCD".encode('ascii'))
    res = []
    
    for byte in ascii:
        for i in range(8):
            bit = (byte >> (7 - i)) & 1
            res.append(bit)
    
    print(np.array(res, dtype=bool))
    

    Prints

    [False True False False False False False True False True False False False False True False False True False False False False True True False True False False False True False False]

    Comments

    Isn't Numpy kind of overkill just to convert 1 and 0 to True and False? Why not: res.append(bool(bit)) or even just leave them as ints? by @Mark

    • Every single element in the Python list is an object.

    • Numpy arrays use single element type and don't use any type dynamically, like Python list does. This make Numpy arrays more efficient for computationally intensive tasks.

    • If we would deal with 10 million operations, this could be an overkill. But, since the question relates to bit manipulations, the size of data could be large and number of operations could be high, therefore, Numpy arrays are an efficient choice.

    Naked Benchmark

    
    import time, sys, numpy
    
    data = list(range(100000000))
    L = list(data)
    arr = numpy.array(data)
    start = time.time()
    list_mult = [x * 2 for x in L]
    end = time.time()
    print(f"List: {end - start} seconds")
    
    start = time.time()
    arr_mult = arr * 2
    end = time.time()
    print(f"Array: {end - start} seconds")
    

    Prints

    List: 12.309393882751465 seconds 
    Array: 2.8275811672210693 seconds
    

    Note

    (byte >> (7 - i)) & 1:

    i = 0: (65 >> (7 - 0)) & 1 → (65 >> 7) & 1 → 00000000 & 1 → 0
    i = 1: (65 >> (7 - 1)) & 1 → (65 >> 6) & 1 → 00000001 & 1 → 0
    i = 2: (65 >> (7 - 2)) & 1 → (65 >> 5) & 1 → 00000010 & 1 → 1
    i = 3: (65 >> (7 - 3)) & 1 → (65 >> 4) & 1 → 00000100 & 1 → 0
    i = 4: (65 >> (7 - 4)) & 1 → (65 >> 3) & 1 → 00001000 & 1 → 0
    i = 5: (65 >> (7 - 5)) & 1 → (65 >> 2) & 1 → 00010000 & 1 → 0
    i = 6: (65 >> (7 - 6)) & 1 → (65 >> 1) & 1 → 00100000 & 1 → 0
    i = 7: (65 >> (7 - 7)) & 1 → (65 >> 0) & 1 → 01000001 & 1 → 1
    
    [False  True False False False False False  True  # 'A' -> 01000001
     False  True  True False False False False False  # 'B' -> 01000010
     False  True  True  True False False False False  # 'C' -> 01000011
     False  True  True  True False False  True False] # 'D' -> 01000100
    

    Note that the bitwise & between any bit and 1 returns the bit itself, which effectively filters out all the other bits:

    0 & 1 → 0
    1 & 1 → 1