I have a df with column 'x' which i want to sample data from and store it in a new dataframe df_pull. This process i want to repeat in a for loop e.g. 10 times. My problem is that: 'name 'df_pull' is not defined'. Sure, this is beacause i did not defiend df_pull, but how do i create an empty df. This is not possible right? I was succsesfull by creating a lot if lists, but I am sure this is not the best solution.
for i in np.arange(10):
df_pull[[i]] = df['x'].sample(frac=1)
Thank you.
Use list comrehension with concat
and also is important DataFrame.reset_index
with drop=True
for avoid same columns values (because index alignmenet):
r = np.arange(10)
L = [df['x'].sample(frac=1).reset_index(drop=True) for i in r]
df_pull = pd.concat(L, axis=1, keys=r)
Your solution with empty DataFrame and also DataFrame.reset_index
:
df = pd.DataFrame({
'y':[7,8,9,4,2,3],
'x':[1,3,5,7,1,0],
})
df_pull = pd.DataFrame()
for i in np.arange(10):
df_pull[i] = df['x'].sample(frac=1).reset_index(drop=True)
print (df_pull)
0 1 2 3 4 5 6 7 8 9
0 1 7 1 1 1 5 3 5 3 1
1 7 1 5 5 0 1 1 1 7 7
2 5 0 0 7 1 3 5 3 1 5
3 3 3 3 0 3 0 7 1 1 3
4 0 1 7 1 5 7 1 7 5 1
5 1 5 1 3 7 1 0 0 0 0