Let's say I have a list of (float) numbers:
list_numbers = [0.27,0.26,0.64,0.61,0.81,0.83,0.78,0.79,0.05,0.12,0.07,0.06,0.38,0.35,0.04,0.03,0.46,0.01,0.18,0.15,0.36,0.36,0.26,0.26,0.93,0.12,0.31,0.28,1.03,1.03,0.85,0.47,0.77]
In my case, this is obtained from a pandas dataframe column, meaning that they are not bounded between any pair of values a priori.
The idea now is to obtain a new list of randomly-generated numbers, which follow the same distribution, meaning that, for a sufficiently large sample, both lists should have fairly similar histograms.
I tried using np.random.choice
, but as I do not want to generate one of the values in the original list but rather new values which are or not in it, but follow the same distribution, it does not work...
As the person aforementiod the list is relatively small, so it is indeed hard to descide what the distribution looks like. Eventhough the following code might provide a solution to your problem:
import matplotlib.pyplot as plt
# Original list of list_numbers
list_numbers = [0.27,0.26,0.64,0.61,0.81,0.83,0.78,0.79,0.05,0.12,0.07,0.06,0.38,0.35,0.04,0.03,0.46,0.01,0.18,0.15,0.36,0.36,0.26,0.26,0.93,0.12,0.31,0.28,1.03,1.03,0.85,0.47,0.77]
# Construct histogram using 10 bins
counts, bin_edges = np.histogram(list_numbers, bins=10)
# Sample new list_numbers using the histogram
new_numbers = np.random.choice(bin_edges[:-1], size=len(list_numbers), p=counts/len(list_numbers))
# Create histograms for original and new list_numbers
plt.hist(list_numbers, bins=bin_edges, label="Original list_numbers")
plt.hist(new_numbers, bins=bin_edges, label="New list_numbers")
# Add labels and legend
plt.xlabel("Value")
plt.ylabel("Count")
plt.legend()
# Show the plot
plt.show()