Search code examples
pythonrandomdistribution

Generate random numbers from a distribution given by a list of numbers in python


Let's say I have a list of (float) numbers:

list_numbers = [0.27,0.26,0.64,0.61,0.81,0.83,0.78,0.79,0.05,0.12,0.07,0.06,0.38,0.35,0.04,0.03,0.46,0.01,0.18,0.15,0.36,0.36,0.26,0.26,0.93,0.12,0.31,0.28,1.03,1.03,0.85,0.47,0.77]

In my case, this is obtained from a pandas dataframe column, meaning that they are not bounded between any pair of values a priori.

The idea now is to obtain a new list of randomly-generated numbers, which follow the same distribution, meaning that, for a sufficiently large sample, both lists should have fairly similar histograms.

I tried using np.random.choice, but as I do not want to generate one of the values in the original list but rather new values which are or not in it, but follow the same distribution, it does not work...


Solution

  • As the person aforementiod the list is relatively small, so it is indeed hard to descide what the distribution looks like. Eventhough the following code might provide a solution to your problem:

    import matplotlib.pyplot as plt
    
    # Original list of list_numbers
    list_numbers = [0.27,0.26,0.64,0.61,0.81,0.83,0.78,0.79,0.05,0.12,0.07,0.06,0.38,0.35,0.04,0.03,0.46,0.01,0.18,0.15,0.36,0.36,0.26,0.26,0.93,0.12,0.31,0.28,1.03,1.03,0.85,0.47,0.77]
    
    # Construct histogram using 10 bins
    counts, bin_edges = np.histogram(list_numbers, bins=10)
    
    # Sample new list_numbers using the histogram
    new_numbers = np.random.choice(bin_edges[:-1], size=len(list_numbers), p=counts/len(list_numbers))
    
    # Create histograms for original and new list_numbers
    plt.hist(list_numbers, bins=bin_edges, label="Original list_numbers")
    plt.hist(new_numbers, bins=bin_edges, label="New list_numbers")
    
    # Add labels and legend
    plt.xlabel("Value")
    plt.ylabel("Count")
    plt.legend()
    
    # Show the plot
    plt.show()