Search code examples
pythonnumpyvectorization

bootstrap numpy 2D array


I am trying to sample with replacement a base 2D numpy array with shape of (4,2) by rows, say 10 times. The final output should be a 3D numpy array.

Have tried the code below, it works. But is there a way to do it without the for loop?

base=np.array([[20,30],[50,60],[70,80],[10,30]])
print(np.shape(base))
nsample=10
tmp=np.zeros((np.shape(base)[0],np.shape(base)[1],10))
for i in range(nsample):
    id_pick = np.random.choice(np.shape(base)[0], size=(np.shape(base)[0]))
    print(id_pick)
    boot1=base[id_pick,:]
    tmp[:,:,i]=boot1
print(tmp)

Solution

  • Here's one vectorized approach -

    m,n = base.shape
    idx = np.random.randint(0,m,(m,nsample))
    out = base[idx].swapaxes(1,2)
    

    Basic idea is that we generate all the possible indices with np.random.randint as idx. That would an array of shape (m,nsample). We use this array to index into the input array along the first axis. Thus, it selects random rows off base. To get the final output with a shape (m,n,nsample), we need to swap last two axes.