python matrix cython linear-algebra finite-field

Fast computation of matrix rank over GF(2)

For my current project I need to be able to calculate the rank of 64*64 matrices with entries from GF(2). I was wondering if anyone had a good solution.

I've been using pyfinite for this, but it is rather slow since it's a pure python implementation. I've also tried to cythonise the code I've been using, but have had issues due to relying on pyfinite.

My next idea would be to write my own class in cython, but that seems a bit overkill for what I need.

I need the following functionality

matrix = GF2Matrix(size=64) # creating a 64*64 matrix
matrix.setRow(i, [1,0,1....,1]) # set row using list
matrix += matrix2 # addition of matrices
rank(matrix) # then computing the rank

Thanks for any ideas.

Solution

One way to efficiently represent a matrix over GF(2) is to store the rows as integers, interpreting each integer as a bit-string. So for example, the 4-by-4 matrix

[0 1 1 0]
[1 0 1 1]
[0 0 1 0]
[1 0 0 1]

(which has rank 3) could be represented as a list [6, 13, 4, 9] of integers. Here I'm thinking of the first column as corresponding to the least significant bit of the integer, and the last to the most significant bit, but the reverse convention would also work.

With this representation, row operations can be performed efficiently using Python's bitwise integer operations: ^ for addition, & for multiplication. Then you can compute the rank using a standard Gaussian elimination approach.

Here's some reasonably efficient code. Given a collection rows of nonnegative integers representing a matrix as above, we repeatedly remove the last row in the list, and then use that row to eliminate all 1 entries from the column corresponding to its least significant bit. If the row is zero then it has no least significant bit and doesn't contribute to the rank, so we simply discard it and move on.

def gf2_rank(rows):
    """
    Find rank of a matrix over GF2.

    The rows of the matrix are given as nonnegative integers, thought
    of as bit-strings.

    This function modifies the input list. Use gf2_rank(rows.copy())
    instead of gf2_rank(rows) to avoid modifying rows.
    """
    rank = 0
    while rows:
        pivot_row = rows.pop()
        if pivot_row:
            rank += 1
            lsb = pivot_row & -pivot_row
            for index, row in enumerate(rows):
                if row & lsb:
                    rows[index] = row ^ pivot_row
    return rank

Let's run some timings for random 64-by-64 matrices over GF2. random_matrices is a function to create a collection of random 64-by-64 matrices:

import random

def random_matrix():
    return [random.getrandbits(64) for row in range(64)]

def random_matrices(count):
    return [random_matrix() for _ in range(count)]

and here's the timing code:

import timeit

count = 1000
number = 10
timer = timeit.Timer(
    setup="ms = random_matrices({})".format(count),
    stmt="[gf2_rank(m.copy()) for m in ms]",
    globals=globals())
print(min(timer.repeat(number=number)) / count / number)

The result printed on my machine (2.7 GHz Intel Core i7, macOS 10.14.5, Python 3.7) is 0.0001984686384, so that's a touch under 200µs for a single rank computation.

200µs is quite respectable for a pure Python rank computation, but in case this isn't fast enough, we can follow your suggestion to use Cython. Here's a Cython function that takes a 1d NumPy array of dtype np.uint64, again thinking of each element of the array as a row of your 64-by-64 matrix over GF2, and returns the rank of that matrix.

# cython: language_level=3, boundscheck=False

from libc.stdint cimport uint64_t, int64_t

def gf2_rank(uint64_t[:] rows):
    """
    Find rank of a matrix over GF2.

    The matrix can have no more than 64 columns, and is represented
    as a 1d NumPy array of dtype `np.uint64`. As before, each integer
    in the array is thought of as a bit-string to give a row of the
    matrix over GF2.

    This function modifies the input array.
    """
    cdef size_t i, j, nrows, rank
    cdef uint64_t pivot_row, row, lsb

    nrows = rows.shape[0]

    rank = 0
    for i in range(nrows):
        pivot_row = rows[i]
        if pivot_row:
            rank += 1
            lsb = pivot_row & -pivot_row
            for j in range(i + 1, nrows):
                row = rows[j]
                if row & lsb:
                    rows[j] = row ^ pivot_row

    return rank

Running equivalent timings for 64-by-64 matrices, now represented as NumPy arrays of dtype np.uint64 and shape (64,), I get an average rank-computation time of 7.56µs, over 25 times faster than the pure Python version.