Suppose I have the following numpy array:
Space = np.arange(7)
Question: How could I generate a set of N samples from Space such that:
In essence I'm creating a markov reward process via numpy sampling (There is probably a more efficient packet for this, but i'm not sure what it would be.) For example if N = 3, a possible sampled set would look something like this.
Sample = [[1,0],[4, 3, 4, 5, 6],[4, 3, 2, 1, 2, 1, 0]]
I can accomplish this with something not very elegant like this:
N = len(Space)
Set = []
for i in range(3):
X = np.random.randint(N)
if (X == 0) | (X==6):
Set.append(X)
else:
Sample = []
while (X !=0) & (X != 6):
Next = np.array([X-1, X+1])
X = np.random.choice(Next)
Sample.append(X)
Set.append(Sample)
return(Set)
But I was wondering what a more efficient/pythonic way to go about this type of sampling, perhaps without so many loops? Or alternatively if there are better python libraries for this sort of thing? Thanks.
Numpy doesn't seem to be helping much here, I'd just use the standard random
module. The main reason is that random
faster when working with single values as this algorithm does and there doesn't seem to be any need to pull in an extra dependency unless needed.
from random import randint, choice
def bounded_path(lo, hi):
# r covers the interior space
r = range(lo+1, hi)
n = randint(lo, hi)
result = [n]
while n in r:
n += choice((-1, 1))
result.append(n)
return result
seems to do the right thing for me, e.g. evaluating the above 10 times, I get:
[0]
[4, 3, 4, 3, 2, 1, 0]
[5, 6]
[2, 3, 4, 3, 4, 5, 4, 3, 4, 3, 2, 1, 0]
[1, 0]
[1, 0]
[4, 3, 4, 3, 4, 3, 2, 3, 2, 1, 0]
[3, 2, 3, 2, 1, 0]
[6]
[4, 5, 4, 3, 4, 3, 2, 1, 0]
Just did quick benchmark of random number generation comparing:
def rng_np(X):
for _ in range(10):
X = np.random.choice(np.array([X-1,X+1]))
return X
def rng_py(X):
for _ in range(10):
X += choice((-1, +1))
return X
The Numpy version is ~30 times slower. Numpy has to do lots of extra work, building a Python array each iteration, converting to a Numpy array, switching in choice
to allow for fancy vectorisation. Python knows that the (-1, +1)
in the vanilla version is constant, so it's just built once (e.g. dis
is useful to see what's going on inside).
You might be able to get somewhere by working with larger blocks of numbers, but I doubt it would be much faster. Maintaining the uniformity of starting point seems awkward, but you could probably do something if you were really careful! Numpy starts to break even when each call is vectorised over approx 10 values, and really shines when you have more than 100 values.