I'm writing dataset generator on Python and I got following problem: I need a set of zero-one matrices with no empty columns/rows. Also the ratio between zeros and ones should be constant.
I've tried to shuffle zero-one list with fixed ratio of zeros and ones with following reshaping, but for matrices with hundreds of rows/cols it's too long. Also I took into account that I can't achieve some inputs like 3*10 matrix with 9 one-elements and that some inputs can have only solution like 10*10 matrix with 10 one-elements.
If I understand the task, something like this might work:
import numpy as np
from collections import defaultdict, deque
def gen_mat(n, m, k):
"""
n: rows,
m: cols,
k: ones,
"""
assert k % n == 0 and k % m == 0
mat = np.zeros((n, m), dtype=int)
ns = np.repeat(np.arange(n), k // n)
ms = np.repeat(np.arange(m), k // m)
# uniform shuffle
np.random.shuffle(ms)
ms_deque = deque(ms)
assigned = defaultdict(set)
for n_i in ns:
while True:
m_i = ms_deque.popleft()
if m_i in assigned[n_i]:
ms_deque.append(m_i)
continue
mat[n_i, m_i] = 1
assigned[n_i].add(m_i)
break
return mat
We first observe that an n x m matrix can be populated with k ones s.t. equal ratios only k is divisible by both n and m.
Assuming this condition holds, each row index will appear k/n times and each column index will appear m/k times. We shuffle the column indices to ensure that the assignment is random, and store the random column indices in a deque for efficiency.
For each row, we store a set of columns s.t. mat[row, column] = 1 (initially empty). We can now loop over each row k/n times, picking the next column s.t. mat[row, column] = 0 from the deque and set mat[row, column] to 1.
Without loss, assume that n <= m. This algorithm terminates successfully unless we encounter a situation when all remaining columns in the deque satisfy mat[row, column] = 1. This can only happen in the last row, meaning that we have already assigned k/m + 1 ones to some column, which is impossible.