Search code examples
pythonnumpymatrixprecisionmultiplication

Numpy matrix multiplication issue with 20 elements


I am using a matrix multiplication method to retrieve the position of True and False into an array; this is necessary because I cannot use a for look (I have thousands of records). The procedure is the following:

import numpy as np
# Create a test array
test_array = np.array([[False, True, False, False, False, True]])
# Create a set of unique "tens", each one identifying a position
uniq_tens = [10 ** (i) for i in range(0, test_array.shape[1])]
# Multiply the matrix
print(int(np.dot(test_array, uniq_tens)[0]))
100010

The 10010 must be read from right to left (0=False, 1=True, 0=False, 0=False, 1=True). Everything works fine except if the test_array is of 20 elements.

# This works fine - Test with 21 elements
test_array = np.array([[False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, True]])
print(test_array.shape[1])
uniq_tens = [10 ** (i) for i in range(0, test_array.shape[1])]
print(int(np.dot(test_array, uniq_tens)[0]))
21
111000000000000000010

# This works fine - Test with 19 elements
test_array = np.array([[False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True]])
print(test_array.shape[1])
uniq_tens = [10 ** (i) for i in range(0, test_array.shape[1])]
print(int(np.dot(test_array, uniq_tens)[0]))
19
1000000000000000010

# This does not work - Test with 20 elements
test_array = np.array([[False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True]])
print(test_array.shape[1])
uniq_tens = [10 ** (i) for i in range(0, test_array.shape[1])]
print(int(np.dot(test_array, uniq_tens)[0]))
20
10000000000000000000

I tested with numpy version 1.16.4/1.19.4 and 1.19.5. Could you please help me in understanding why? I am worried it can happen also with other numbers, not only 20.

Thanks a lot for your help!


Solution

  • Explanation

    You are hitting int64 limit:

    print(len(str(2 ** (64 - 1))))
    # 19
    

    when computing uniq_tens, which causes some data type problems in conjuction with np.dot() with mixed data type inputs.


    More precisely, what happens here is that:

    1. uniq_tens content is Python's int, which is arbitrary precision
    2. when you call np.dot() the uniq_tens list is converted to a NumPy array, with unspecified data type
      • when the maximum value is up until np.iinfo(np.int64).max the datatype is inferred to be int64
      • when the maximum value is up between np.iinfo(np.int64).max and np.iinfo(np.uint64).max the datatype is inferred to be uint64
      • when the maximum value is above that it retains the Python object and falls back to arbitrary precision
    3. There might be an extra cast in np.dot() if the inputs are of mixed dtype. In the case of np.bool_ and np.uint64 the inferred common type is np.float64.

    Now:

    max_int64 = np.iinfo(np.int64).max
    print(max_int64, len(str(max_int64)))
    # 9223372036854775807 19
    
    max_uint64 = np.iinfo(np.uint64).max
    print(max_uint64, len(str(max_uint64)))
    # 18446744073709551615 20
    
    print(repr(np.array([max_int64])))
    # array([9223372036854775807])
    print(repr(np.array([max_uint64])))
    # array([18446744073709551615], dtype=uint64)
    print(repr(np.array([max_uint64 + 1])))
    # array([18446744073709551616], dtype=object)
    

    So, up until 19 and above 21 everything works well. When you use 20, it does convert to uint64. However, when you use np.dot() it realizes it can no longer use int64 nor uint64 to hold the result and casts everything to np.float64:

    print(np.dot([1], [max_int64]))
    # 9223372036854775807
    print(np.dot([1], [max_uint64]))
    # 1.8446744073709552e+19
    print(np.dot([1], [max_uint64 + 1]))
    # 18446744073709551616
    

    Instead, when you start with something that is already a uint64 it keeps using that:

    print(np.dot(np.array([1], dtype=np.uint64), [max_uint64]))
    # 18446744073709551616
    print(np.dot(np.array([4321], dtype=np.uint64), [max_uint64]))
    # 18446744073709547295  # wrong result
    

    which has its own overflow problems.


    Mitigation

    One way to make sure the above code works all the time is to force the dtype of uniq_tens to object:

    import numpy as np
    
    
    test_array = np.array([[0, 1] + [0] * 17 + [1]])
    uniq_tens = np.array([(10 ** i) for i in range(test_array.shape[1])], dtype=object)
    
    print(test_array.shape[1], int(np.dot(test_array, uniq_tens)[0]))
    # 20 10000000000000000010
    

    Other approaches

    If we are after the fastest way of computing the integer with a specific base, a number of approaches can be devised:

    import numpy as np
    import numba as nb
    
    
    def bools_to_int(arr, base=2):
        return sum(base ** i for i, x in enumerate(arr.tolist()) if x)
    
    
    def bools_to_int_dot(arr, base=2):
        pows = np.array([base ** i for i in range(len(arr))], dtype=object)
        return np.dot(arr, pows)
    
    
    def bools_to_int_mul_sum(arr, base=2):
        pows = np.array([base ** i for i in range(len(arr))], dtype=object)
        return np.sum(arr * pows)
    
    
    @nb.njit
    def bools_to_int_nb(arr, base=2):
        n = arr.size
        result = 0
        for i in range(n):
            if arr[i]:
                result += base ** i
        return result
    

    The looped approach can be accelerated also with Cython:

    %%cython -c-O3 -c-march=native -a
    #cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, infer_types=True
    
    # cimport numpy as cnp
    # cimport cython as ccy
    
    # import numpy as np
    # import cython as cy
    
    
    cpdef bools_to_int_cy(arr, base=2):
        cdef long n = arr.size
        result = 0
        for i in range(n):
            if arr[i]:
                result += base ** i
        return result
    

    Note that the bools_to_int_nb() approach will work until the int64 limit.

    Since the power operation is one of the most expensive in such computation, it can be precomputed to further speed-up multiple calls:

    MAX_PRE_VAL = 256
    BASES = list(range(2, 16))
    POWS = {
        b: np.array([b ** i for i in range(MAX_PRE_VAL)])
        for b in BASES}
    
    
    def bools_to_int_pre(arr, base=2, pows=POWS):
        return sum(pows[base][i] for i, x in enumerate(arr.tolist()) if x)
    
    
    def bools_to_int_dot_pre(arr, base=2, pows=POWS):
        return np.dot(arr, pows[base][:len(arr)])
    
    
    def bools_to_int_mul_sum_pre(arr, base=2, pows=POWS):
        return np.sum(arr * pows[base][:len(arr)])
    

    It is easy to see that all methods produce the same result (save for the bools_to_int_nb() with the limitations already noted):

    funcs = (
        bools_to_int, bools_to_int_pre,
        bools_to_int_dot, bools_to_int_dot_pre,
        bools_to_int_mul_sum, bools_to_int_mul_sum_pre,
        bools_to_int_cy, bools_to_int_nb)
    
    
    rng = np.random.default_rng(42)
    arr = rng.integers(0, 2, 112)
    for func in funcs:
        print(f"{func.__name__!s:>32}  {func(arr)}")
    
                        bools_to_int  3556263535586786347937292461931686
                    bools_to_int_pre  3556263535586786347937292461931686
                    bools_to_int_dot  3556263535586786347937292461931686
                bools_to_int_dot_pre  3556263535586786347937292461931686
                bools_to_int_mul_sum  3556263535586786347937292461931686
            bools_to_int_mul_sum_pre  3556263535586786347937292461931686
                     bools_to_int_cy  3556263535586786347937292461931686
                     bools_to_int_nb  -4825705174627124058
    

    With the following code to produce the timings:

    rng = np.random.default_rng(42)
    
    
    timings = {}
    k = 16
    for n in range(1, 128, 3):
        arrs = rng.integers(0, 2, (k, n))
        print(f"n = {n}")
        timings[n] = []
        base = [funcs[0](arr) for arr in arrs]
        for func in funcs:
            res = [func(arr) for arr in arrs]
            is_good = base == res
            timed = %timeit -r 8 -n 16 -q -o [func(arr) for arr in arrs]
            timing = timed.best * 1e6 / k
            timings[n].append(timing if is_good else None)
            print(f"{func.__name__:>24}  {is_good}  {timing:10.3f} µs")
    

    to be plotted with:

    import pandas as pd
    
    
    df = pd.DataFrame(data=timings, index=[func.__name__ for func in funcs]).transpose()
    df.plot(marker='o', xlabel='Input size / #', ylabel='Best timing / µs', figsize=(10, 8))
    

    benchmark

    Showing that before hitting the int64 limit, the bools_to_int_nb() approach is the fastest by far and large. Conversely, for larger values np.dot() with pre-computing is the fastest. Without pre-computing, resorting to simple manual looping is the fastest, and the Cython acceleration provides a small but appreciable speed-up.

    Beware that power-of-2 problem can probably be optimized more.