Search code examples
pythonpandasnumpymulti-indexrandom-seed

Making a multiindex df with dice throws


Im messing with pandas and numpy and there was a tutorial on my course summing two numbers from a dice. The tutorial used pandas but I tried using numpy too, and then compared the results.

throws = 50
diepd = pd.DataFrame([1, 2, 3, 4, 5, 6])
dienp = np.array([1,2,3,4,5,6])
np.random.seed(1) 
sum_np = [np.random.choice(dienp,2,True).sum() for i in range(throws)] 
sum_pd = [diepd.sample(2, replace=True).sum().loc[0] for i in range(throws)]

compare = pd.DataFrame(data={'sum_np': sum_np, 'sum_pd': sum_pd})

compare

Im having real difficulties understanding/manipulating multiindex dataframes, so as an extra lesson, Id like to learn how to create one with the results, to compare where they differ (since Im using the same seed).

The index would be just the 50 (1 to throws) throws. The index labels, columns, would be: Level 0: 2 columns: numpy results and pandas results.

Level 1: Three columns each: The 2 individual throws and the sum. For example, the two values of np.random.choice(dienp,2,True) and diepd.sample(2, replace=True, and the respective sums.

numpy pandas
No. throw1 throw2 sum throw1 throw2 sum
1 1 2 3 4 5 9
2 2 3 5 6 1 7
3 4 6 10 5 2 7

Any suggestions?


Solution

  • Looking at the way that you have your code, it seems very difficult to get the values of each die without looping through the trows on its own line and having values appended to lists embedded into the loop. My solution was to set up two different tables then concatenate them together after. You can see my code below:

    import pandas as pd
    import numpy as np
    
    throws = 50
    diepd = pd.DataFrame([1, 2, 3, 4, 5, 6])
    dienp = np.array([1,2,3,4,5,6])
    np.random.seed(1)
    np_roll=[]
    pd_roll=[]
    for i in range(3):
        np_roll.append([])
        pd_roll.append([])
    for i in range(throws):
        for j in range(2):
            np_roll[j].append(np.random.choice(dienp,1,True).sum())
            pd_roll[j].append(diepd.sample(1, replace=True).sum().loc[0])
            np_roll[j]=list(np_roll[j])
            pd_roll[j]=list(pd_roll[j])
        np_roll[2].append(np_roll[0][i]+np_roll[1][i])
        pd_roll[2].append(pd_roll[0][i]+pd_roll[1][i])
        
    
    np_df = pd.DataFrame(data={'Roll 1': np_roll[0], 'Roll 2': np_roll[1], "Sum": np_roll[2]})
    pd_df = pd.DataFrame(data={'Roll 1': pd_roll[0], 'Roll 2': pd_roll[1], "Sum": pd_roll[2]})
    
    compare = pd.concat([np_df, pd_df],axis=1,keys=["Numpy", "Pandas"])
    
    pd.set_option('display.max_columns', None)
    print(compare)