I am using a matrix multiplication method to retrieve the position of True and False into an array; this is necessary because I cannot use a for look (I have thousands of records). The procedure is the following:
import numpy as np
# Create a test array
test_array = np.array([[False, True, False, False, False, True]])
# Create a set of unique "tens", each one identifying a position
uniq_tens = [10 ** (i) for i in range(0, test_array.shape[1])]
# Multiply the matrix
print(int(np.dot(test_array, uniq_tens)[0]))
100010
The 10010 must be read from right to left (0=False, 1=True, 0=False, 0=False, 1=True). Everything works fine except if the test_array is of 20 elements.
# This works fine - Test with 21 elements
test_array = np.array([[False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, True]])
print(test_array.shape[1])
uniq_tens = [10 ** (i) for i in range(0, test_array.shape[1])]
print(int(np.dot(test_array, uniq_tens)[0]))
21
111000000000000000010
# This works fine - Test with 19 elements
test_array = np.array([[False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True]])
print(test_array.shape[1])
uniq_tens = [10 ** (i) for i in range(0, test_array.shape[1])]
print(int(np.dot(test_array, uniq_tens)[0]))
19
1000000000000000010
# This does not work - Test with 20 elements
test_array = np.array([[False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True]])
print(test_array.shape[1])
uniq_tens = [10 ** (i) for i in range(0, test_array.shape[1])]
print(int(np.dot(test_array, uniq_tens)[0]))
20
10000000000000000000
I tested with numpy version 1.16.4/1.19.4 and 1.19.5. Could you please help me in understanding why? I am worried it can happen also with other numbers, not only 20.
Thanks a lot for your help!
You are hitting int64 limit:
print(len(str(2 ** (64 - 1))))
# 19
when computing uniq_tens
, which causes some data type problems in conjuction with np.dot()
with mixed data type inputs.
More precisely, what happens here is that:
uniq_tens
content is Python's int
, which is arbitrary precisionnp.dot()
the uniq_tens
list is converted to a NumPy array, with unspecified data type
np.iinfo(np.int64).max
the datatype is inferred to be int64
np.iinfo(np.int64).max
and np.iinfo(np.uint64).max
the datatype is inferred to be uint64
np.dot()
if the inputs are of mixed dtype. In the case of np.bool_
and np.uint64
the inferred common type is np.float64
.Now:
max_int64 = np.iinfo(np.int64).max
print(max_int64, len(str(max_int64)))
# 9223372036854775807 19
max_uint64 = np.iinfo(np.uint64).max
print(max_uint64, len(str(max_uint64)))
# 18446744073709551615 20
print(repr(np.array([max_int64])))
# array([9223372036854775807])
print(repr(np.array([max_uint64])))
# array([18446744073709551615], dtype=uint64)
print(repr(np.array([max_uint64 + 1])))
# array([18446744073709551616], dtype=object)
So, up until 19 and above 21 everything works well.
When you use 20, it does convert to uint64
.
However, when you use np.dot()
it realizes it can no longer use int64
nor uint64
to hold the result and casts everything to np.float64
:
print(np.dot([1], [max_int64]))
# 9223372036854775807
print(np.dot([1], [max_uint64]))
# 1.8446744073709552e+19
print(np.dot([1], [max_uint64 + 1]))
# 18446744073709551616
Instead, when you start with something that is already a uint64 it keeps using that:
print(np.dot(np.array([1], dtype=np.uint64), [max_uint64]))
# 18446744073709551616
print(np.dot(np.array([4321], dtype=np.uint64), [max_uint64]))
# 18446744073709547295 # wrong result
which has its own overflow problems.
One way to make sure the above code works all the time is to force the dtype
of uniq_tens
to object
:
import numpy as np
test_array = np.array([[0, 1] + [0] * 17 + [1]])
uniq_tens = np.array([(10 ** i) for i in range(test_array.shape[1])], dtype=object)
print(test_array.shape[1], int(np.dot(test_array, uniq_tens)[0]))
# 20 10000000000000000010
If we are after the fastest way of computing the integer with a specific base, a number of approaches can be devised:
import numpy as np
import numba as nb
def bools_to_int(arr, base=2):
return sum(base ** i for i, x in enumerate(arr.tolist()) if x)
def bools_to_int_dot(arr, base=2):
pows = np.array([base ** i for i in range(len(arr))], dtype=object)
return np.dot(arr, pows)
def bools_to_int_mul_sum(arr, base=2):
pows = np.array([base ** i for i in range(len(arr))], dtype=object)
return np.sum(arr * pows)
@nb.njit
def bools_to_int_nb(arr, base=2):
n = arr.size
result = 0
for i in range(n):
if arr[i]:
result += base ** i
return result
The looped approach can be accelerated also with Cython:
%%cython -c-O3 -c-march=native -a
#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, infer_types=True
# cimport numpy as cnp
# cimport cython as ccy
# import numpy as np
# import cython as cy
cpdef bools_to_int_cy(arr, base=2):
cdef long n = arr.size
result = 0
for i in range(n):
if arr[i]:
result += base ** i
return result
Note that the bools_to_int_nb()
approach will work until the int64 limit.
Since the power operation is one of the most expensive in such computation, it can be precomputed to further speed-up multiple calls:
MAX_PRE_VAL = 256
BASES = list(range(2, 16))
POWS = {
b: np.array([b ** i for i in range(MAX_PRE_VAL)])
for b in BASES}
def bools_to_int_pre(arr, base=2, pows=POWS):
return sum(pows[base][i] for i, x in enumerate(arr.tolist()) if x)
def bools_to_int_dot_pre(arr, base=2, pows=POWS):
return np.dot(arr, pows[base][:len(arr)])
def bools_to_int_mul_sum_pre(arr, base=2, pows=POWS):
return np.sum(arr * pows[base][:len(arr)])
It is easy to see that all methods produce the same result (save for the bools_to_int_nb()
with the limitations already noted):
funcs = (
bools_to_int, bools_to_int_pre,
bools_to_int_dot, bools_to_int_dot_pre,
bools_to_int_mul_sum, bools_to_int_mul_sum_pre,
bools_to_int_cy, bools_to_int_nb)
rng = np.random.default_rng(42)
arr = rng.integers(0, 2, 112)
for func in funcs:
print(f"{func.__name__!s:>32} {func(arr)}")
bools_to_int 3556263535586786347937292461931686
bools_to_int_pre 3556263535586786347937292461931686
bools_to_int_dot 3556263535586786347937292461931686
bools_to_int_dot_pre 3556263535586786347937292461931686
bools_to_int_mul_sum 3556263535586786347937292461931686
bools_to_int_mul_sum_pre 3556263535586786347937292461931686
bools_to_int_cy 3556263535586786347937292461931686
bools_to_int_nb -4825705174627124058
With the following code to produce the timings:
rng = np.random.default_rng(42)
timings = {}
k = 16
for n in range(1, 128, 3):
arrs = rng.integers(0, 2, (k, n))
print(f"n = {n}")
timings[n] = []
base = [funcs[0](arr) for arr in arrs]
for func in funcs:
res = [func(arr) for arr in arrs]
is_good = base == res
timed = %timeit -r 8 -n 16 -q -o [func(arr) for arr in arrs]
timing = timed.best * 1e6 / k
timings[n].append(timing if is_good else None)
print(f"{func.__name__:>24} {is_good} {timing:10.3f} µs")
to be plotted with:
import pandas as pd
df = pd.DataFrame(data=timings, index=[func.__name__ for func in funcs]).transpose()
df.plot(marker='o', xlabel='Input size / #', ylabel='Best timing / µs', figsize=(10, 8))
Showing that before hitting the int64
limit, the bools_to_int_nb()
approach is the fastest by far and large.
Conversely, for larger values np.dot()
with pre-computing is the fastest.
Without pre-computing, resorting to simple manual looping is the fastest, and the Cython acceleration provides a small but appreciable speed-up.
Beware that power-of-2 problem can probably be optimized more.