Search code examples
pythonnumpygmpy

Efficiently convert gmpy2.mpz to numpy boolean array


I try to convert from gmpy2.mpz to a numpy boolean array, but can't quite get it right. (gmpy2: https://gmpy2.readthedocs.io)

import gmpy2
import numpy as np

x = gmpy2.mpz(int('1'*1000,2))

print("wrong conversion 1")
y = np.fromstring(gmpy2.to_binary(x), dtype=bool) # this is wrong
print(np.sum(y)) # this returns 127 instead of 1000

print("wrong conversion 2")
y = np.fromstring(gmpy2.to_binary(x), dtype=np.uint8)
print(y) # array([  1,   1, 255 ... 255], dtype=uint8)
y_bool = np.unpackbits(y)
slow_popcount = np.sum(y_bool, dtype=int)
print(slow_popcount) # 1002. should be 1000

print("Fudging an answer. This is wrong as well.")
y = np.fromstring(gmpy2.to_binary(x)[2:], dtype=np.uint8)
# is that slicing [2:] a slow operation?
y_bool = np.unpackbits(y)
print np.sum(y_bool, dtype=int) # 1000

More tests:

np.fromstring(gmpy2.to_binary(gmpy2.mpz(int('1'*64,2))), dtype=np.uint8)
# array([  1,   1, 255, 255, 255, 255, 255, 255, 255, 255], dtype=uint8)
np.fromstring(gmpy2.to_binary(gmpy2.mpz(int('1'*65,2))), dtype=np.uint8)
# array([  1,   1, 255, 255, 255, 255, 255, 255, 255, 255,   1], dtype=uint8
np.fromstring(gmpy2.to_binary(gmpy2.mpz(int('1'*66,2))), dtype=np.uint8)
# array([  1,   1, 255, 255, 255, 255, 255, 255, 255, 255,   3], dtype=uint8)
np.fromstring(gmpy2.to_binary(gmpy2.mpz(int('1'*1024,2))), dtype=np.uint8)
# array([  1,   1, 255 ... 255], dtype=uint8)

By the way, I actually want to quickly get a list, array, or numpy array of indices of all set bit of a gmpy2.mpz. The actual 4,777,000 gmpy2.mpz that I try to convert each has 760,000 bits with about 2,000 bits of 1. The gmp library on the computer was compiled with intel icc.

Thanks


Solution

  • There are a couple of options. The function gmpy2.bit_scan1(x, n) will return the index of the first bit that is set that has an index >= n.

    >>> x = gmpy2.mpz(123456)
    >>> bin(x)
    '0b11110001001000000'
    >>> n = 0
    >>> while True:
    ...     n = gmpy2.bit_scan1(x, n)
    ...     if n is None:
    ...         break
    ...     print(n)
    ...     n = n + 1
    ... 
    6
    9
    13
    14
    15
    16
    

    The gmpy2 also supports an integer type called xmpz. It is an experimental version of the mpz type. The primary difference is that xmpz type is mutable - in-place operations will directly modify the value without creating a copy. This makes the xmpz type very useful for bit manipulations. For example, you can extract and modify bit positions using slice notation.

    The xmpz type also supports methods called iter_set, iter_clear, and iter_bits.

    >>> x_str='1'*8+'01'
    >>> x_int=gmpy2.xmpz(x_str, 2)
    >>> list(x_int.iter_set())
    [0, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> list(x_int.iter_clear())
    [1]
    >>> list(x_int.iter_bits())
    [True, False, True, True, True, True, True, True, True, True]
    

    I originally wrote the xmpz type to evaluate any performance improvements for optimizing in-place operations. Bit manipulation saw the greatest benefits. Here is a short and fast implementation of the Sieve of Eratosthenes.

    def sieve(limit=1000000):
        '''Returns a generator that yields the prime numbers up to limit.'''
    
        sieve_limit = gmpy2.isqrt(limit) + 1
        limit += 1
        # Mark bit positions 0 and 1 as not prime.
        bitmap = gmpy2.xmpz(3)
        # Process 2 separately. This allows us to use p+p for the step size
        # when sieving the remaining primes.
        bitmap[4 : limit : 2] = -1
        # Sieve the remaining primes.
        for p in bitmap.iter_clear(3, sieve_limit):
            bitmap[p*p : limit : p+p] = -1
        return bitmap.iter_clear(2, limit)