I have a sorted list that I need to sample from. I want to favor the items towards each end of the list. In other words, I want to sample from the list using a negated normal function such that the first and last entries of the list are chosen more frequently than items in the middle of the list. I tried this:
slots = np.floor(np.random.normal(scale=len(children)//2, size=max_children)) - max_children//2
return children[slots]
However, it returns numbers that are out of range. It also returns duplicate numbers. What can I do better?
As your are working with a list of discrete values I would argue you would rather work with a multinomial distribution of the list indices. In Numpy this can be done conveniently with the np.random.choice
method, which directly takes the probabilities associated with each entry. Here is a minimal example:
import numpy as np
from scipy.stats import norm
import matplotlib.pyplot as plt
random_state = np.random.RandomState(4873)
children = np.arange(20)
p = norm.pdf(np.arange(len(children)), loc=(len(children) - 1) / 2, scale=10)
p = p.max() - p + 0.01 # to prevent samples in the middle end up with zero probability
samples = random_state.choice(children, p=p / p.sum(), size=10_000)
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
axes[0].bar(children, p)
axes[0].set_title("Probability mass function")
axes[1].hist(samples, bins=np.arange(len(children) + 1) - 0.5, density=True, alpha=0.5)
axes[1].set_title("Samples")
I also think random.choice
also makes the intention of drawing sampled from the list much more clear.
However the point of possibly using a beta distribution is still valid. In this case you would still convert it to a probability mass function as shown above.
I hope this helps!