Search code examples
pythonnumpymatrixrandomprobability

Are there some functions in Python for generating matrices with special conditions?


I'm writing dataset generator on Python and I got following problem: I need a set of zero-one matrices with no empty columns/rows. Also the ratio between zeros and ones should be constant.

I've tried to shuffle zero-one list with fixed ratio of zeros and ones with following reshaping, but for matrices with hundreds of rows/cols it's too long. Also I took into account that I can't achieve some inputs like 3*10 matrix with 9 one-elements and that some inputs can have only solution like 10*10 matrix with 10 one-elements.


Solution

  • If I understand the task, something like this might work:

    import numpy as np
    from collections import defaultdict, deque
    
    def gen_mat(n, m, k):
        """
        n: rows,
        m: cols,
        k: ones,
        """
        assert k % n == 0 and k % m == 0
        mat = np.zeros((n, m), dtype=int)
    
        ns = np.repeat(np.arange(n), k // n)
        ms = np.repeat(np.arange(m), k // m)
        # uniform shuffle
        np.random.shuffle(ms)
        ms_deque = deque(ms)
    
        assigned = defaultdict(set)
        for n_i in ns:
            while True:
                m_i = ms_deque.popleft()
                if m_i in assigned[n_i]:
                    ms_deque.append(m_i)
                    continue
                mat[n_i, m_i] = 1
                assigned[n_i].add(m_i)
                break
    
        return mat
    

    We first observe that an n x m matrix can be populated with k ones s.t. equal ratios only k is divisible by both n and m.

    Assuming this condition holds, each row index will appear k/n times and each column index will appear m/k times. We shuffle the column indices to ensure that the assignment is random, and store the random column indices in a deque for efficiency.

    For each row, we store a set of columns s.t. mat[row, column] = 1 (initially empty). We can now loop over each row k/n times, picking the next column s.t. mat[row, column] = 0 from the deque and set mat[row, column] to 1.

    Without loss, assume that n <= m. This algorithm terminates successfully unless we encounter a situation when all remaining columns in the deque satisfy mat[row, column] = 1. This can only happen in the last row, meaning that we have already assigned k/m + 1 ones to some column, which is impossible.