Search code examples
pythonpandassample

Create new df coulmns in for loop


I have a df with column 'x' which i want to sample data from and store it in a new dataframe df_pull. This process i want to repeat in a for loop e.g. 10 times. My problem is that: 'name 'df_pull' is not defined'. Sure, this is beacause i did not defiend df_pull, but how do i create an empty df. This is not possible right? I was succsesfull by creating a lot if lists, but I am sure this is not the best solution.

for i in np.arange(10):
    df_pull[[i]] = df['x'].sample(frac=1)

Thank you.


Solution

  • Use list comrehension with concat and also is important DataFrame.reset_index with drop=True for avoid same columns values (because index alignmenet):

    r = np.arange(10)
    L = [df['x'].sample(frac=1).reset_index(drop=True) for i in r]
    df_pull  = pd.concat(L, axis=1, keys=r)
    

    Your solution with empty DataFrame and also DataFrame.reset_index:

    df = pd.DataFrame({
             'y':[7,8,9,4,2,3],
             'x':[1,3,5,7,1,0],
    
    })
    
    df_pull = pd.DataFrame()
    for i in np.arange(10):
        df_pull[i] = df['x'].sample(frac=1).reset_index(drop=True)
    
    print (df_pull)
       0  1  2  3  4  5  6  7  8  9
    0  1  7  1  1  1  5  3  5  3  1
    1  7  1  5  5  0  1  1  1  7  7
    2  5  0  0  7  1  3  5  3  1  5
    3  3  3  3  0  3  0  7  1  1  3
    4  0  1  7  1  5  7  1  7  5  1
    5  1  5  1  3  7  1  0  0  0  0