I have a list of data frames that look like this called date_group
. I have 117 such frames, one for each day.
date_group[0]
looks like this.
x1 x2 x3 prob x5 date
0 1.0 1.0 20.0 0.05 90.0 2021-12-23
1 1.0 2.0 20.0 0.60 90.0 2021-12-23
2 1.0 3.0 20.0 0.01 90.0 2021-12-23
3 1.0 4.0 20.0 0.02 90.0 2021-12-23
4 1.0 5.0 20.0 0.01 90.0 2021-12-23
...
30 .............................................
date_group[1]
looks like this.
x1 x2 x3 prob x5 date
0 1.0 1.0 20.0 0.04 90.0 2021-12-24
1 1.0 2.0 20.0 0.61 90.0 2021-12-24
2 1.0 3.0 20.0 0.02 90.0 2021-12-24
3 1.0 4.0 20.0 0.03 90.0 2021-12-24
4 1.0 5.0 20.0 0.01 90.0 2021-12-24
...
29 ............................................
Each frame has a list of the same date but different prob
or probabilities.
I am trying to create a single data frame in this format. I want to take the date and the entire column of probabilities, 30 in each frame and create there own row for each corresponding date.
date 0 1 2 3 4 ........ 29
1. 2021-12-23 0.05 0.60 0.01 0.02 0.01 ...........
2. 2021-12-24 0.04 0.61 0.02 0.03 0.01 ...........
.....
So far I have written this code.
feed = pd.DataFrame(columns=['date',0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29])
for i in range(0, len(date_group)):
feed.loc[i,'date'] = date_group[i].date.iloc[0]
feed.loc[i] = date_group[i]['prob'].reset_index()['prob']
But my output looks like this,
date 0 1 2 3 4 5 6 7 8 ... 20 21 22 23 24 25 26 27 28 29
0 NaN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.02 0.0 0.0 0.0 0.0 0.0 0.05
1 NaN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.02 0.0 0.0 0.0 0.0 0.0 0.05
2 NaN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.02 0.0 0.0 0.0 0.0 0.0 0.05
3 NaN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.02 0.0 0.0 0.0 0.0 0.0 0.05
4 NaN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.01 0.0 0.0 0.0 0.0 0.0 0.04
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
112 NaN 0.32 0.5 0.68 0.79 0.98 0.79 0.99 0.99 0.99 ... 0.99 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
113 NaN 0.29 0.45 0.62 0.72 0.89 0.72 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
114 NaN 0.26 0.39 0.55 0.74 0.9 0.74 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
115 NaN 0.24 0.34 0.49 0.76 0.91 0.76 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
116 NaN 0.21 0.29 0.43 0.68 0.82 0.68 1.0 0.9 1.0 ... 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
The dates are empty but everything else is there.
How do I get the dates in as well and make the frame look like I want it to in my example above?
here is one way to do it
concatenate all the DFs together, and then use pivot to create the desired table. here, i named the first dataframe as DF, second as DF2 and combine these into df3
# create a list of all the DFs
df_list = [df.reset_index(),
df2.reset_index()
]
df3=pd.concat(df_list )
df3.pivot(index='date' ,
columns='index',
values='prob').reset_index()
or without creating a df3
# create a list of all the DFs
df_list = [df.reset_index(),
df2.reset_index()
]
pd.concat(df_list).pivot(index='date' ,
columns='index',
values='prob').reset_index()
index date 0 1 2 3 4
0 2021-12-23 0.05 0.60 0.01 0.02 0.01
1 2021-12-24 0.04 0.61 0.02 0.03 0.01