Search code examples
pythonnumpyscientific-computing

A grid over probability vectors


I am trying to get a "grid" of n-dimensional probability vectors---vectors in which every entry is between 0 and 1, and all entries add up to 1. I wish to have every possible vector in which coordinates can take any of a number v of evenly spaced values between 0 and 1.

In order to illustrate this, what follows is a horribly inefficient implementation, for n = 3 and v = 3:

from itertools import product
grid_redundant = product([0, .5, 1], repeat=3)
grid = [point for point in grid_redundant if sum(point)==1]

now grid contains [(0, 0, 1), (0, 0.5, 0.5), (0, 1, 0), (0.5, 0, 0.5), (0.5, 0.5, 0), (1, 0, 0)].

This "implementation" is just terrible for higher dimensional and more fine-grained grids. Is there a good way to do this, perhaps using numpy?


I could perhaps add a point on motivation: I would be perfectly happy if just sampling from a random distribution gave me sufficiently extreme points, but it does not. See this question. The "grid" I am after is not random, but systematically sweeps the simplex (the space of probability vectors.)


Solution

  • Here is a recursive solution. It does not use NumPy and is not super efficient either although it should be faster than the posted snippet:

    import math
    from itertools import permutations
    
    def probability_grid(values, n):
        values = set(values)
        # Check if we can extend the probability distribution with zeros
        with_zero = 0. in values
        values.discard(0.)
        if not values:
            raise StopIteration
        values = list(values)
        for p in _probability_grid_rec(values, n, [], 0.):
            if with_zero:
                # Add necessary zeros
                p += (0.,) * (n - len(p))
            if len(p) == n:
                yield from set(permutations(p))  # faster: more_itertools.distinct_permutations(p)
    
    def _probability_grid_rec(values, n, current, current_sum, eps=1e-10):
        if not values or n <= 0:
            if abs(current_sum - 1.) <= eps:
                yield tuple(current)
        else:
            value, *values = values
            inv = 1. / value
            # Skip this value
            yield from _probability_grid_rec(
                values, n, current, current_sum, eps)
            # Add copies of this value
            precision = round(-math.log10(eps))
            adds = int(round((1. - current_sum) / value, precision))
            for i in range(adds):
                current.append(value)
                current_sum += value
                n -= 1
                yield from _probability_grid_rec(
                    values, n, current, current_sum, eps)
            # Remove copies of this value
            if adds > 0:
                del current[-adds:]
    
    print(list(probability_grid([0, 0.5, 1.], 3)))
    

    Output:

    [(1.0, 0.0, 0.0), (0.0, 1.0, 0.0), (0.0, 0.0, 1.0), (0.5, 0.5, 0.0), (0.0, 0.5, 0.5), (0.5, 0.0, 0.5)]
    

    Quick comparison to posted method:

    from itertools import product
    
    def probability_grid_basic(values, n):
        grid_redundant = product(values, repeat=n)
        return [point for point in grid_redundant if sum(point)==1]
    
    values = [0, 0.25, 1./3., .5, 1]
    n = 6
    %timeit list(probability_grid(values, n))
    1.61 ms ± 20.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    %timeit probability_grid_basic(values, n)
    6.27 ms ± 186 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)