Search code examples
pythonrandomiteratorpython-itertools

Python itertools create iterator of random subset


I have an iterator itertools.combinations(big_matrix,50) with big_matrix.shape = (65,x), so there are about 10^14 combinations. I want to get a random subset of say 10000 of this combinations, also as an iterator, to save memory.

I tried the itertools recipe

def random_combination(iterable, r):
  "Random selection from itertools.combinations(iterable, r)"
  pool = tuple(iterable)
  n = len(pool)
  indices = sorted(random.sample(xrange(n), r))
  return tuple(pool[i] for i in indices)

but tuple(iterable) creates a tuple of the 10^14 values, and the function does not return an iterator but an array.

random.sample does not work, because it is unable to get the number of elements in the itertools.combinations object.

Is there any way to do this?


Solution

  • Just produce random combinations, tracking what you've seen before:

    def random_combinations(matrix, size):
        seen = set()
        n = len(matrix)
        while True:
            new_sample = tuple(sorted(random.sample(xrange(n), size)))
            if new_sample not in seen:
                seen.add(new_sample)
                yield tuple(matrix[i] for i in new_sample)
    

    Iterating through all possible combinations to sample is not efficient, you still end up testing all 10^14 combinations.

    The above generator picks a random combination each time you iterate; if you need a certain number, use a loop or itertools.islice(); picking 10 random combinations would be:

    combinations_sample = list(islice(random_combinations(matrix, 50), 10))
    

    You may have misunderstood what the function you found does; it does much the same as my function above but produces just the one random combination, without tracking what was produced before. You were supposed to use it on matrix, not on all combinations of matrix.