Fast Bitwise Get Column in Python

Is there an efficient way to get an array of boolean values that are in the n-th position in bitwise array in Python?

Create numpy array with values 0 or 1:

import numpy as np

array = np.array(
    [
     [1, 0, 1],   
     [1, 1, 1], 
     [0, 0, 1],    
    ]
)

Compress size by np.packbits:

pack_array = np.packbits(array, axis=1)

Expected result - some function that could get all values from n-th column from bitwise array. For example if I would like the second column I would like to get (the same as I would call array[:,1]):

array([0, 1, 0])

I have tried numba with the following function. It returns right results but it is very slow:

import numpy as np
from numba import njit

@njit(nopython=True, fastmath=True)
def getVector(packed, j):
    n = packed.shape[0]
    res = np.zeros(n, dtype=np.int32)
    for i in range(n):
        res[i] = bool(packed[i, j//8] & (128>>(j%8)))
    return res

How to test it?

import numpy as np
import time
from numba import njit

array = np.random.choice(a=[False, True], size=(100000000,15))

pack_array = np.packbits(array, axis=1)

start = time.time()
array[:,10]
print('np array')
print(time.time()-start)

@njit(nopython=True, fastmath=True)
def getVector(packed, j):
    n = packed.shape[0]
    res = np.zeros(n, dtype=np.int32)
    for i in range(n):
        res[i] = bool(packed[i, j//8] & (128>>(j%8)))
    return res

# To initialize
getVector(pack_array, 10)

start = time.time()
getVector(pack_array, 10)
print('getVector')
print(time.time()-start)

It returns:

np array
0.00010132789611816406
getVector
0.15648770332336426

Solution

Besides some micro-optimisations, I dont believe that there is much that can be optimised here. There are also a few small mistakes in your code:

@njit(nopython=True) is saying the same thing twice (the n in njit already stands for nopython mode.) simply @njit or @jit(nopython=True) should be used
fastMath is for "cutting corners" when doing floating point math, since we are only working with integers and booleans, it can be safely removed because it does nothing for us here.

My updated code (seeing a meagre 40% perfomance increase on my machine):

import numba as nb
import numpy as np

np.random.seed(0)
array = np.random.choice(a=[False, True], size=(10000000,15))

pack_array = np.packbits(array, axis=1)

@nb.njit(locals={'res': nb.boolean[:]})
def getVector(packed, j):
    n = packed.shape[0]
    res = np.zeros(n, dtype=nb.boolean)
    byte = j//8
    bit = 128>>(j%8)
    for i in range(n):
        res[i] = bool(packed[i, byte] & bit)
    return res

getVector(pack_array, 10)

In your answer, "res" is a list of 32 bit integers, by giving np.zeros() the numba (NOT numpy) boolean datatype, we can swap it to the more efficient booleans. This is where most of the perfomance improvement comes from. On my machine putting j_mod and j_flr outside of the loop had no noticable effect. But it did have an effect for the commenter @Michael Szczesny, so it might help you aswell.

I would not try to use strides, which @Nick ODell is suggesting, because they can be quite dangerous if used incorrectly. (See the numpy documentation).

edit: I have made a few small changes that were suggested by Michael. (Thanks)