Search code examples

Python weighted random choices from lists with different probability, comparing two Pandas DataFrame from CSVs

Python beginner, here. I am attempting to take a pandas DataFrame (created from a CSV) and use weighted random choices to choose from another DataFrame (created from a CSV). What I have is two pandas DataFrames that read something like this:

Weighted Percentages of codes:

B1 800 5%
B1 801 65%
B1 802 30%
B2 900 30%
B2 901 70%
B3 600 50%
B3 601 50%

Input pandas DataFrame to run weighted percentages on:

B1 14
B2 25
B3 12

These are just examples of my tables rather than the entirety of the tables themselves. What I need to do is store these weighted probabilities whether in a dictionary, lists, or pandas dataframes (not sure what's best) - and take my second table above and apply the 'Final_Per' %'s to the 'NUMBER' column and output the result. So B1's result would be 14 values, 5% being code 800, 65% being code 801, and 30% being code 802. Currently, the tables are CSV's and I am turning them into pandas dataframes and attempting to take some lessons learned from this article to no success. Does anybody have suggestions on how to handle this correctly? Thank you.


  • Another way of doing it is to load the csv files into dataframes, merge them and use .apply.

    from numpy.random import choice
    df1 = pd.read_csv(/path/to/csv1)
    df2 = pd.read_csv(/path/to/csv2)
    def calculate_distribution(mini_df): 
        prob = mini_df.Final_Per.str[:-1].astype(float) / 100
        return choice(mini_df.CODE.values, mini_df.NUMBER.values[0], p=prob)
    distributions = df1.merge(df2, on='SECTION').groupby('SECTION').apply(calculate_distribution)